Thursday, January 19, 2012


Anyone who's been around for a little while knows that the idea of storing data outside of a relational database is hardly new. Unix, for example, stores essentially all of its data in flat files, often with keys. Have you looked at your /etc/passwd file lately? It's just a little database, and it certainly doesn't use SQL.

So I'm confused as to why the idea of a non-relational database is being treated by so many people as a novel idea. It's as if a generation of developers wrote a few web systems using the LAMP stack, noticed that MySQL wasn't ideal for everything, and concluded that the whole relational model was at fault and needed to be replaced. Is this increased attention simply due to the fact the this collection of non-relational data stores now has a name?

I think that we've all worked with a relational database that is big and slow. In my case, that would describe maybe half the databases that I've encountered. But those databases aren't big and slow because the relational model is bad. They're big and slow either because they aren't designed very well or because they are being used for something that's not a good fit for the relational model.

The fact that there are some relational databases that are slow and some use cases that don't call for a relational database doesn't mean that the relational model is inherently flawed. It just means that it's not the right solution for everything, which is something that can be said about every technology out there. Every technology is a compromise because, as is the case with almost everything that exists in the real world, optimizing the design to improve certain characteristics inevitably degrades others.

Even a cursory review of the available NoSQL databases strongly reinforces the notion that they are not overall better than relational databases, just different. They have been optimized for different use cases, and as a result, they are better-suited for some use cases than regular relational databases, but less suitable for others. And the optimizations tend to follow a pattern: almost all of the NoSQL offerings trade off functionality for speed. I can understand the motivation behind that, since relational databases are most often criticized for being slow. But it's important to understand what's being traded. MongoDB, for example, does not support multi-document transactions or queries. By omitting support for those features, MongoDB is able to deliver better performance for the features it does support. If you're basically storing independent documents and don't need to relate them in any way, then great. But if you want to make an atomic multi-document update at some point, you'll be out of luck. You'll have to roll your own transactions with version numbers or lock flags. VoltDB also trades off flexibility for speed. It's way faster than any traditional relational database, but that's because it runs entirely in memory and doesn't support locking or concurrency of any kind. If you're working with a relatively small data set and are only executing fairly simple queries, then you're golden. But it also has limitations that general-purpose databases don't.

I think it's also important to keep in mind that relational databases didn't spring up from nowhere. There were non-relational data stores like IMS long before there were relational databases. Relational databases were developed in response to the perceived flaws of those earlier database technologies. We've been here before.

So when I need to store some data, I'll probably think relational first. But since I might encounter a use case that would benefit from a different type of data store, I'll be happy to have NoSQL in my toolbox.


Post a Comment

<< Home