Sunday, May 2, 2010

If U Wanna Grow, Learn To Say NoSQL

In this post i will try to make a rough overview of wtf is the NoSQL database and why and when to use it.

Well, we all know how cool and easy it is when programming even the simple application to have data of the application's model stored in a relational SQL database like MySQL or PostgreSQL. It's easy to acces with the well known SQL language, vast majority of developers is familiar with principles of building such a dabtabase like SQL normal forms, there are plenty developed opensource libraries with pretty good documentation and community etc etc.

So why to invent or even try something new, when we have this established relational database ecosystem? Because in last few years it has showed up, that relational databases cannot scale properly due their complicated relations between data, that they handle. So what was considered as a great advantage of relational databases in their decades of fame has became disadvantage. It's because today's vast internet applications like Facebook, Google or Twitter wasn't able to handle their bazillion pieces of data in a real-time. Firstly it was partially solved by denormalizations of their relational databases models, but it wasn't as effective as they've expected. The database related infrastructure necessary to satisfy their needs became very expensive and without desirable impact on a speed of the application. So with a NoSQL database you have to give up some of the relational database properties like ACID or consistency guarantee but on the other hand you have build-in partitioning, load balancing, transparent replication, great scaling possibilities with the ability to add capacity without without any influence or impact on applications running against the database.

I don't want to say that SQL databases are bad or generally unusable. Not at all. I think that SQL related technology is great and will do its job perfectly for most of your projects. I just want to say, that if you are planning to be really (really) BIG, you should consider using other technology than SQL.

So here comes NoSQL

First of all NoSQL is a big paradigm shift in modeling data and it can be pretty confusing at the beginning. Forget all these JOINs, GROUP BYs, foreign keys, rigid schemes, consistency guarantees. The basic concept of all NoSQL databases is to store only key-value pairs. Tables and designs are replaced by document oriented storage (see the picture below). Those of you familiar with JSON should cope with that very quickly. The principle of NoSQL is to keep all pieces of the related informations together so that it's easy fetch them. So as a developer you have to model such a database with a "queries" that will be neede by you application already in your mind. In other words you have to know, what kind of operations above the dataset will you need. Yes, this approach has higher requirements on the developers abilities and yes it's not easy to get familiar with this concept, but give it a time.



Examples of NoSQL databases

NoSQL are being developed to effectively handle any amount of data. Their development is kinda fresh. It means that they are optimized for modern concurrency driven environments and clouds in. An examples we can mention MongoDB, CouchDB. Nice example of a connection between the database world and the cloud world can support of Apache Cassandra project (formerly facebook proprietary code) from the side of Rackspace company running cloud server hosting services.

I'm planning some more in-depth views in to the world of NoSQL in a form of examples and tutorials. So if you are interested in this topic, stay tuned.

3 comments:

  1. Nice post, I agree NOSQL needs to be used when its right to use it. I'm worried that people are just jumping on a band wagon though, the relational database and with SQL as the abstracted access language has a lot of benefits like most people know SQL and its easy for tools to drag stuff out.

    Distributed SQL based databases are expensive and don't scale out without significant investment; I'm waiting for somebody to build a NOSQL solution on top of the real relational model - the model that SQL bastardises; the relational model itself doesn't have ACID - that's part of the implementation that was chosen by SQL based products.

    ReplyDelete
  2. Thank you for your comment (1st on this blog :)). I think that the price can be one of the main motivations to try NoSQL paradigm because as you've said, it's very expensive to buy enterprise class distributed relational databases. NoSQL offers much cheaper solution proven by biggest players on the Internet field.

    You're right about ACID, but good SQL databases have it implemented.

    Do you have any experience working with NoSQL? Or have you worked with some of the DBs mentioned above?

    ReplyDelete