Mohit Ranka

RDBMS vs. NOSQL?

| Comments

From our own experience designing and operating a highly available, highly scalable ecommerce platform, we have come to realize that relational databases should only be used when an application really needs the complex query, table join and transaction capabilities of a full-blown relational database. In all other cases, when such relational features are not needed, a NoSQL database service like DynamoDB offers a simpler, more available, more scalable and ultimately a lower cost solution.

            — Werner Vogels, CTO Amazon.com on when to use RDBMS

I came across this quote in an article while researching DynamoDB. As much as I respect Werner, I would suggest take his recommendation on the criteria on database selection with a pinch of salt, due to the obvious conflict of interest — He has a database to sell.

Most products do not need the scale, which cannot be served by relational databases. Relational databases have survived more than 20 years and still doing well, even at large scale! There are extremely mature open source rdbms, requirement independent schemas, optimized queries for filtering, aggregation and list of data, great communities, acquired knowledge, default integration with programming frameworks and lots of tools developed.

RDBMS are great-for-all, unless…

Relational systems are super easy to work with, have great (free) tools for monitoring, backup, operations available, work well for most of the use cases, there is a lot of help (community, web resources, consulting) around and best of all.. you already know it well. Unless I have the following requirements I would use a single node relational database.

  • Super high availability

Relational database’s server process is a single point of failure, which means your system will go down, when the process crashes or taken down for scheduled or unscheduled maintainence.

If you are building a system which needs to have close to 100% availability, look into clustered relational databases or distributed databases which choose availability over consistency, eg. cassandra.

  • Flexible schema

One of relational databases’ great strength is its schema, data model and related concepts, which allow for schema modelling, without knowing the query patterns in advance, and works well. However, not all data can be effectively represented in relational data model. Even though there is some support for flexible schema datastructure, relational databases are efficitent on fixed schema which can confirm to normalization rules for a tradeoff between consistency and performance. Flexible schema do not work well with relational databases.

If you need (do you, really?) flexible schema, look at other nosql options.

  • Horizontal scaling

If you are looking to build a system that should scale horizontally as your data needs grow, a single-node relational database will not suffice forever. There is only so much room to grow, so much RAM and CPU to throw on a single machine. Sooner or later, you will hit that single server machine limit. At that point, you will have to either choose a NoSQL solution or a relational database system cluster.

Epilogue

NoSQL is no panacea and RDBMS are still the best choice for most of the applications. Unless you have strong need to not used a relational database, stick to it.

Comments