October, 2015
I'm not sure what qualifies as "grizzled", but I've worked with relational databases for most of my 23 year career in tech. I've worked primarily as a designer and developer but have also worked for short periods as an acting DBA. I've worked with most platforms including DB2, Oracle, MySQL, Postgres, Sybase and SQL Server. The first reference book on DB2 that I used was written by Date & Codd, two pioneers of the relational model.
I want to describe our experience of migrating the Watercooler Analytics platform from a relational model (SQL Server) to a NoSQL model (MongoDB).
Firstly, I want to summarize our experience. The migration was quite difficult but well worth the effort. In retrospect, it's a bit odd that so much of the tech world depends on a database architecture (relational) that's been with us since the late 60s. I would also say that I would be very reluctant to ever build another application using a relational database. These posts won't be an academic comparison of relational vs. NoSQL, but rather a story of why we switched and why it's a pragmatic option.
Secondly, I want to describe why you might want to consider a NoSQL database. To me, the best database for a task is one that can model the data in the form closest to its "natural state". For example, imagine trying to model the human body --- the body has a head, torso, arms and legs. That's a start, but let's take it down a level --- an arm has a hand, a forearm and an upper arm. Now, let's go deeper --- a hand has fingers, fingers have ligaments, tendons and bones. You get the point. Now imagine, trying to model this in a spreadsheet or a relational database. It's not an easy task because the data is not naturally broken into rows and columns. Of course, one can break it into rows and columns but the problem is that there is no abstraction of the complexity --- every table and every column has equal "weight".
For the same reason that object oriented programming is useful since one can encapsulate (hide) complexity, the same applies to NoSQL databases. One is able to easily create a hierarchy of data that abstracts the complexity. From an object-oriented programming perspective, this is pure bliss since you're able to map the objects to the data without the classic "imepance" mismatch that occurs when mapping objects to relational databases.
One of the issues we struggled with during our first efforts with NoSQL was deciding when to create new collections versus "embedding" attributes within existing collections. The rules we used were that we always chose embedding as the first alternative since, again, it was the most natural way to model the data. However, from a practical performance perspective, we do not embed a set of attributes if we feel the data volume is going to grow at a rate much faster than the parent attributes. For example, we might embed all sorts of attributes within a User collection that describes the user, but we would stop short of embedding the User in the Customer collection since we expect users to grow at a much faster rate than customers.
So, the primary reason you should consider NoSQL is that it is generally superior for managing complexity and, in most cases, for modelling real world data. Of course there are situations where data may lend itself better to the relational model. What I'm arguing is that NoSQL should be the default option for new applications and only in exceptional cases should relational databases be selected.
I'll close off this post with a pet-peeve. Nothing bothers me more than when I see developers using object-data mappers (ODMs) such as Mongoid to effectively recreate a relational database in a NoSQL database. This completely defeats the purpose of both types of database. You get none of the encapsulation benefits of NoSQL and your performance will be terrible since you're still relying on multiple joins which are not optimal in a NoSQL world.
Mark Clancy is a senior data engineer based in Vancouver, Canada.
All author posts