Tuesday, June 07, 2011

Is SQL reallly the problem?

There is a lot of buzz about NoSQL out there. According to the hype, NoSQL solves scalability issues, NoSQL simplifies development, NoSQL has unlimited write performance, etc. While these claims may technically be true, they are in no way exclusive to the NoSQL offerings out there. All of these things are achievable in a SQL database with full ACID and relational capabilities. In a recent post, Derrick Harris asks “Will Scalable Data Stores Make NoSQL a Non-Starter?”. He Makes the point that scalable SQL is a reality today so why would someone go to a NoSQL solution? What benefit could it possibly bring? Would anyone choose to give up transactions if they didn’t have to? Would anyone choose to give up ACID properties? Would anyone choose to give up the ability to do relational operations? How would giving up any of these things simplify development? Looking at the comments on Derrick’s post provides a clue. They mention the “flexible schema” as the key feature.

A flexible schema is defined by the ability to assign arbitrary properties to an object without having to have defined columns for those properties. For example, in a single table, object A could represent a picture and have “height” and “width” properties while object B could represent an audio stream that has “length” and “bitrate” properties. In a traditional RDBMS environment, you’d probably create multiple tables for each different object type which can be inconvenient and possibly inefficient.

Is that it? Why wouldn’t you just add that feature to a scalable SQL database rather than throw away all the good things SQL databases have to offer? You could easily add a “map” column to a table that allows you to store an arbitrary map of key/values associated with an object. You now have the tools to implement a flexible schema while maintaining relational capabilities, ACID properties, and full scalability. It’s the best of both worlds.

Here are some signs that you may actually NEED a NoSQL solution

  1. You noticed a lot of your database fields are really serialized complex objects in disguise. Why bother with a RDBMS at all then? Storing serialized objects in a relational database is like being on the pill while trying to get pregnant, a bit counter productive. Just use a schemaless database from the start.
  2. Using a standard query language has become too confining. You just want to be free. SQL is so easy, so convenient, and so standard, it's really not a challenge anymore. You need to be different. Then NoSQL is for you. Each has their own completely different query mechanism.
  3. Your toolbox only contains a hammer. Hammers while wonderfully versatile, can not make a nice latte. One size doesn't fit all. Right tool for the right job and all that jazz.
  4. You really feel like protesting something, but all the really cool causes are full up. Protesting the many centuries of relational database hegemony is a first rate cause you can be proud of. Think of all the stirring chants: "Let My Schema Go!" or "Give Me Primary Key Access or Give Me Indexes!" or "Don't Join On Me!"
  5. You stepped in a giant pile of impedance mismatch and need to wash off your shoes. Maybe whatever product you are trying to build would be better represented by a graph? Or a document model? Or a super model? Stop viewing the world from table colored glasses.
  6. Maintaining a completely separate object caching system on top of an already beefy table storage system, has started to seem a little silly. It's a massive duplication of effort, resources, and the consistency problems are brutal.
  7. The Four Horseman contract you to build a fast and infinitely scalable website to help crowd source their new startup: EndOfTheWorld.me.

There is a wealth of excellent NoSQL documents out there, and perhaps one of the first things to learn is the different types of technologies used to solve very different problems. Cassandra is not the same as Voldemort (not to be confused with Harry Potter’s arch nemesis), and MongoDB is different yet again. Not one of these systems can be considered a direct replacement for traditional relational systems such as SQL Server or MySQL, either, but they do solve very specific problems that many web sites and services face as their tools are adopted by more people.

One interesting item worth noting is that many of these technologies, while still very young, are incredibly robust when used properly. Organizations like Google, Twitter, and Zynga would never have employed these systems if they weren’t up to the task. The key, of course, is ensuring the systems that will access this database are created with NoSQL in mind right from the start. As scaling technologies continue to advance and grow more complex, the days of simply bolting something onto the side of an application are coming to a close.Link

Do you use NoSQL in any of your projects? Has it proven to be faster or more reliable than typical RDBMS tools? I’d love to hear your thoughts on the matter.

See you at Oscon last week in July!

OSCON 2011


Creating Atlas Cluster GCP /24 in Terraform

  1. Generating MongoDB Atlas Provider API Keys In order to configure the authentication with the MongoDB Atlas provider, an API key must be...