After taking your first steps into web development, you'll relatively early be confronted with one choice: Should you work with a SQL database like
Especially if working with Node.js you could quickly get the impression that NoSQL (MongoDB) is strictly better.
This, however, is wrong!
In this article, I'll quickly introduce you to the core concepts of both database worlds (SQL and NoSQL) and then highlight the differences and advantages or disadvantages of each solution.
SQL stands for Structured Query Language and it therefore is not a database itself but only a query language you can use to interact with a specific type of database.
SQL allows you to store, update, delete and of course retrieve data from relational database management systems (RDBMS).
Such databases have two key characteristics:
Data is stored in database tables by following a strict data schema (= structure)
Data is distributed across multiple tables which are connected via relations
Data is stored as records in tables and each table has a clearly defined structure - a set of fields which defines which data may go into the table and which data may not.
The structure is defined regarding the names of the fields as well as the datatypes.
You can't add records which don't adhere to this schema. Got more fields? Sorry, you have to pick another table. Missing some field data? Not your table then!
The other important part of SQL-based databases are relations.
You split data into multiple tables so that you avoid data duplication. Therefore, you'll have a
Orders table but each table will only hold data that is not stored in one of the other tables.
This clear structure can have advantages - you won't end up with incorrect data in one of your tables whilst having correct one in all others. That's a case you won't run into because data is always only managed in one table, it is not duplicated across tables.
NoSQL is named like this because it basically follows the opposite approach of SQL databases.
That's how you could summarize it.
Instead, you structure data in collections (= tables in the SQL world). Records are now called documents.
But it's not just about the naming, there is a core difference: You can put data of different structure into the same collection. You couldn't do that in the SQL world - each table had a clear schema there.
Documents look a bit like JSON data - and as mentioned, you don't need to worry about any schema.
Additionally, you typically put related data into the same collection. So if you got a bunch of orders, you would store your order documents in your
Orders collection by including all data you typically query.
So you would end up with documents that already got everything you need - no need to join multiple tables/ collections.
Indeed, NoSQL databases don't know the concept of joining tables/ collections.
You can manually do that (by retrieving a foreign id in collection A and looking it up in collection B) but this will not be your typical flow.
Instead, you duplicate data across collections so that each collection yields exactly the data some part of your app might be looking for.
The concept of data duplication looks disturbing at first. Doesn't that introduce the danger of making mistakes and updating data in collection A without adjusting it in collection B, too?
It does and it will be your job to ensure that data updates are performed in all collections that use a certain data piece.
But the huge advantage is that you don't need to work with complex (and at some point slow) join statements. All the data is already stored in the structure you need it in.
This is especially great for data that doesn't change a dozen times every second.
Vertical & Horizontal Scaling
There's also one other important concept we have to have a look at when comparing databases: Scaling.
How well can you scale your database?
With that I mean how many read and write requests is your database able to handle. Which amounts of data can it work with?
When it comes to scaling, we can differentiate between vertical and horizontal scaling.
Vertical Scaling means that we simply increase the power of the database server - e.g. by upgrading its CPU.
Horizontal Scaling on the other hand means that more servers are added and the database is distributed across them. Hence you still work with one database but multiple servers that host it.
Due to the way data is stored (related tables vs unrelated collections), SQL databases generally support vertical scaling only - horizontal scaling is only possible for NoSQL databases.
SQL databases do know the concept of sharding but it comes with certain restrictions and is typically hard to implement. NoSQL databases natively support this and therefore make it way easier to split your database across multiple servers.
The Right Choice
With the brief overview given, the question is: Which database solution should you work with?
There is no clear winner!
Both SQL and NoSQL are absolutely viable solutions. But they're solutions to different problems - it comes down to your data and application in the end.
Let's summarize the key advantages of both approaches:
Clearly defined schema, data integrity is ensured
Relations allow you to store each data only once - no duplicates
Absence of a schema gives you more flexibility - you can adjust your stored data at any point and introduce new "fields"
Data is stored in the format your app needs it - this speeds up fetching the data
Vertical and horizontal scaling is possible, hence your database will be able to handle any amount of read/ write requests your app throws at it
And the disadvantages:
Less flexibility, data schema needs to be known and planned in advance (adjusting it later is difficult or maybe even impossible)
Relations can lead to very complex queries with a lot of
Horizontal scaling is hard, often only vertical scaling is possible - this means that you'll face some growth limits (regarding throughput you can handle/ performance) at some point
Increased flexibility might lead you to work sloppy and postpone data structure decisions
Duplicate data means that you have to update multiple collections and documents if that data changes - not just one record in one table as you would do it in the SQL world
So when might SQL be best?
You got related data, used in different "chunks" in different parts of your app, that changes relatively often (you would have to update multiple collections all the time in a NoSQL world)
A clear schema is important to you and your data is unlikely to change (drastically)
When is NoSQL best?
Exact data requirements or the data itself is unknown or subject to change/ expand
You require high (read) throughput but you won't change your data that often (i.e. you don't need to update dozens of documents for one change all the time)
You need to scale your database horizontally (i.e. you store enormous amounts of data and have huge read and write throughput)
Obviously, you can structure your database in different ways. And depending on your structure, you might end up with a NoSQL database that mitigates some of the issues (e.g. you reduce the amount of duplicate write requests). The same is true for SQL databases: You can choose a structure that meets your data requirements and doesn't lead to overly complicated