Created by Maximilian Schwarzmüller
Last Updated on July 25, 2018

# Overview

After taking your first steps into web development, you’ll relatively early be confronted with one choice: Should you work with a SQL database like MySQL or a NoSQL database like MongoDB?

Especially if working with Node.js you could quickly get the impression that NoSQL (MongoDB) is strictly better.

This, however, is wrong!

In this article, I’ll quickly introduce you to the core concepts of both database worlds (SQL and NoSQL) and then highlight the differences and advantages or disadvantages of each solution.

Not a reader? Watch the video above this article!

# SQL Databases

SQL stands for Structured Query Language and it therefore is not a database itself but only a query language you can use to interact with a specific type of database.

SQL allows you to store, update, delete and of course retrieve data from relational database management systems (RDBMS).

Such databases have two key characteristics:

  • Data is stored in database tables by following a strict data schema (= structure)
  • Data is distributed across multiple tables which are connected via relations

# Strict Schema

Data is stored as records in tables and each table has a clearly defined structure - a set of fields which defines which data may go into the table and which data may not.

The structure is defined regarding the names of the fields as well as the datatypes. Tables hold records which follow a clearly defined schema of fields. You can’t add records which don’t adhere to this schema. Got more fields? Sorry, you have to pick another table. Missing some field data? Not your table then!

# Relations

The other important part of SQL-based databases are relations.

You split data into multiple tables so that you avoid data duplication. Therefore, you’ll have a Users, Products and Orders table but each table will only hold data that is not stored in one of the other tables. Data is split into multiple, connected tables. This clear structure can have advantages - you won’t end up with incorrect data in one of your tables whilst having correct one in all others. That’s a case you won’t run into because data is always only managed in one table, it is not duplicated across tables.

# NoSQL Databases

NoSQL is named like this because it basically follows the opposite approach of SQL databases.

  • No schemas
  • No relations

That’s how you could summarize it.

Instead, you structure data in collections (= tables in the SQL world). Records are now called documents.

But it’s not just about the naming, there is a core difference: You can put data of different structure into the same collection. You couldn’t do that in the SQL world - each table had a clear schema there. Data is organized in collections and documents - the most striking difference to SQL databases is the missing schema. Documents look a bit like JSON data - and as mentioned, you don’t need to worry about any schema.

Additionally, you typically put related data into the same collection. So if you got a bunch of orders, you would store your order documents in your Orders collection by including all data you typically query.

So you would end up with documents that already got everything you need - no need to join multiple tables/ collections.

Indeed, NoSQL databases don’t know the concept of joining tables/ collections.

You can manually do that (by retrieving a foreign id in collection A and looking it up in collection B) but this will not be your typical flow.

Instead, you duplicate data across collections so that each collection yields exactly the data some part of your app might be looking for. Collections and documents in collections are unrelated. Data is duplicated if needed in multiple collections. The concept of data duplication looks disturbing at first. Doesn’t that introduce the danger of making mistakes and updating data in collection A without adjusting it in collection B, too?

It does and it will be your job to ensure that data updates are performed in all collections that use a certain data piece.

But the huge advantage is that you don’t need to work with complex (and at some point slow) join statements. All the data is already stored in the structure you need it in.

This is especially great for data that doesn’t change a dozen times every second.

# Vertical & Horizontal Scaling

There’s also one other important concept we have to have a look at when comparing databases: Scaling.

How well can you scale your database?

With that I mean how many read and write requests is your database able to handle. Which amounts of data can it work with?

When it comes to scaling, we can differentiate between vertical and horizontal scaling.

  • Vertical Scaling means that we simply increase the power of the database server - e.g. by upgrading its CPU.
  • Horizontal Scaling on the other hand means that more servers are added and the database is distributed across them. Hence you still work with one database but multiple servers that host it. Horizontal scaling adds more servers, vertical scaling more power to a single server. Due to the way data is stored (related tables vs unrelated collections), SQL databases generally support vertical scaling only - horizontal scaling is only possible for NoSQL databases.

SQL databases do know the concept of sharding but it comes with certain restrictions and is typically hard to implement. NoSQL databases natively support this and therefore make it way easier to split your database across multiple servers.

# The Right Choice

With the brief overview given, the question is: Which database solution should you work with?

There is no clear winner!

Both SQL and NoSQL are absolutely viable solutions. But they’re solutions to different problems - it comes down to your data and application in the end.

Let’s summarize the key advantages of both approaches:

SQL

  • Clearly defined schema, data integrity is ensured
  • Relations allow you to store each data only once - no duplicates

NoSQL

  • Absence of a schema gives you more flexibility - you can adjust your stored data at any point and introduce new “fields”
  • Data is stored in the format your app needs it - this speeds up fetching the data
  • Vertical and horizontal scaling is possible, hence your database will be able to handle any amount of read/ write requests your app throws at it

And the disadvantages:

SQL

  • Less flexibility, data schema needs to be known and planned in advance (adjusting it later is difficult or maybe even impossible)
  • Relations can lead to very complex queries with a lot of JOIN statements
  • Horizontal scaling is hard, often only vertical scaling is possible - this means that you’ll face some growth limits (regarding throughput you can handle/ performance) at some point

NoSQL

  • Increased flexibility might lead you to work sloppy and postpone data structure decisions
  • Duplicate data means that you have to update multiple collections and documents if that data changes - not just one record in one table as you would do it in the SQL world

So when might SQL be best?

  • You got related data, used in different “chunks” in different parts of your app, that changes relatively often (you would have to update multiple collections all the time in a NoSQL world)
  • A clear schema is important to you and your data is unlikely to change (drastically)

When is NoSQL best?

  • Exact data requirements or the data itself is unknown or subject to change/ expand
  • You require high (read) throughput but you won’t change your data that often (i.e. you don’t need to update dozens of documents for one change all the time)
  • You need to scale your database horizontally (i.e. you store enormous amounts of data and have huge read and write throughput)

Obviously, you can structure your database in different ways. And depending on your structure, you might end up with a NoSQL database that mitigates some of the issues (e.g. you reduce the amount of duplicate write requests). The same is true for SQL databases: You can choose a structure that meets your data requirements and doesn’t lead to overly complicated JOIN statements.