Friday, May 14, 2010

About Drizzle

Today I'd like to put a light on pretty new project in the field of relational databases. It's called Drizzle and its objective is to compete and even beat MySQL. Drizzle started as a fork of MySQL 6.0 by Brian Aker. The vision was to make a lightweight database system that will suit needs of modern web applications and to trim off all unnecessary features stuffed in MySQL making it slower and biger.

Nowadays, Drizzle is fresh living community-driven database developed with clouds, or generally with high concurrency enabled environments, in mind. The main features are:
  • Microkernel architecture
  • Modularity (modules responsible for additional functionality)
  • MySQL communication protocol (New Drizzle protocol is under development)
  • InnoDB as a standard storage engine (But you can use Drizzle also with non-transactional storage engine like BlitzDB which is new general purpose storage engine being developed along with Drizzle)
  • Written in C++ and being pretty small in comparison with MySQL
  • Small and fast
  • Some (spare) functionality is missing
The fact, that Drizzle is being developed to be cloud-computing ready is represented by tight partnership between the Drizzle project and the Rackspace company. Rackspace is one of the main sponsors of Drizzle and pays few developers for full-time job on the Drizzle code.

I think that Drizzle could be good option when you don't want to give up on traditional relation database paradigm and not to mess up with whatever_it_is NoSQL stuff. If you want small and fast database with good ability to scale in the distributed environment, you should try Drizzle.

Do you have any experience with Drizzle or would you suggest some other DB system?

Saturday, May 8, 2010

Cloud Hosting Price Comparison

So let's look on some cloud hosting services and their cost. We will compare pricing of Amazon EC2, Rackspace, ReliaCloud and Joyent cloud on-demand server services. It's not an easy task because the hosting options differs in some aspects like server hardware configurations, supported operation systems or by provided support. This is not benchmark or test, it's just an overview of the options on the market. I will put emphasis on the cheapest and entry level solutions and prices.

Amazon EC2
EC2 is the biggest and the most established cloud solution on the market. Amazon is a pioneer company in cloud computing. There is a plenty of informations about Amazon cloud solutions and you will find many big companies which are Amazon's customers. However there are weak parts of the Amazon cloud ecosystem. For example the cheapest plan will cost you about $60/month ($0.085/hour) for 1.7GB RAM, 1 EC2 Compute Unit, 160GB of local storage. But Amazon has very sophisticated pricing options and in some cases you can get very competitive configurations. Another thing is that someone can be uncomfortable with the fact that Amazon server instances aren't persistent. It means that if your server goes down (shutdown, hardware failure), it also vanishes (with its local data) from your cloud and you can restore it only by spinning up completely new server (so you should backup quite often). It's a bit more complicated to get your own static IP address than it is with Amazon's competitors. Amazon shared storage product is called EBS (Elastic Block Store) which it is a block level storage volume mountable by EC2 instances. Another option is to use Amazon S3 which is classical Internet storage accessible via web services where you can store possibly any amount of data. S3 pricing starts at $0.15/GB/month. You also have to pay if you wan't to use Amazon's CDN network.
Amazon is said to be slower when talking about CPU power or network throughput. It has very bad direct support from the Amazon's side but it has a big community around its products and well established place between cloud solutions. But I admit that this article is more about encouraging you to try some other cloud solutions than to to passing you to Amazon.

  • Cheapest plan: $0.085/hour for 1.7GB RAM, 160GB local storage, 1 EC2 Compute Unit; with long-term prepaid plans the price goes rapidly down
  • Transient (non-persistent) servers
  • S3 Internet storage: $0.15/GB (can be lower if storing bazillion of GBs)
  • Data transfer in free for now; from June 2010 for $0.10/GB; out for $0.15/GB
  • You have to pay for support
  • Big community, really established products
  • Featured customers: Hefty big customers

Rackspace Cloud Servers
With Rackspace you can start as low as $0.015/hour or $11/month for 256MB RAM, 10GB of local storage (see complete pricing for all options). There is guaranteed CPU power with a free burst when an extra capacity is available on the host and according to benchmarks it is very good (at least better than EC2). Instances of their servers are persistent and have one static IP address, so the system changes you've done on the server will be there until you decide to delete the server. You can have additional IP for $2/IP/month. Bandwidth for your Cloud Servers is calculated separately - Out $0.22/GB, In $0.08/GB. Internet storage (similar to to the Amazon's S3) is called Cloud Files. There is a public API to perform operations to store or fetch your data. You pay $0.15/stored GB and the same price for used bandwidth as with Cloud Servers local storage. Great is that you will pay nothing for using Rackspace's CDN network, which can bring significant saves. Rackspace is well known for their 24/7 Fanatical Support(tm) and they also have great knowledge base with plenty of informations on how to administrate your servers. One big difference between Amazon and Rackspace is that you can manage your Cloud Servers via very nice web-based cloud control application (iphone and android app also available) and access them via web-based console.
Cloud Computing & Cloud Hosting by Rackspace
  • Cheapest plan: $0.15/hour or $11/month for 256MB RAM, 10GB local storage
  • Shared storage: $0.15/GB/month with CDN for free
  • Data in $0.08/GB, out $0.22/GB
  • The best support, very nice UI for managing clouds
  • Many operating systems that you can deploy
  • Featured customers: Posterous, TechCrunch and many others

ReliaCloud on-demand servers starts at $0.05/hour for 512MB RAM and 50GB local storage. But if your server is not running you pay only $0.025/hour. It goes with one free static IP address and you can have additional IPs for $1/IP/month. There are plenty of operation systems that you can deploy but it's not as many as at Rackspace. ReliaCloud has also some kind of knowledge base but again I would say that Rackspace is better at doing this. You are also provided with 24/7 live chat support at no cost and some tutorial videos for start. Good thing is that you pay nothing for inbound traffic. For outbound you will pay $0.12/GB. ReliaCloud doesn't provide you with an Internet storage similar to S3 or Rackspace Cloud Files. UI of control web-based app is very good and similar to the one that has Rackspace.

  • Cheapest plan: $0.015/hour for 512MB RAM, 50GB local storage, 1 CPU
  • Inbound traffic free, outbound $0.12/GB
  • No Internet storage, no CDN
  • Nice UI for managing the cloud and pretty good support
  • Featured customers: Preston Kelly

Also Joyent has its specifics. Their basic bulding cloud units are called Joyent Accelerators. For some specific aplications like MySQL or Glassfish enterprise server they offer software packages called Joyent Virtual Appliances that tailors your Joyent Accelerator to run such an application very effectively (but they are extra charged). For example Joyent web says that their Joyent Accelerator for MySQL can handle 3x more transactions per second than EC2. Joyent solutions are closely bound with Sun (now Oracle) ecosystem so if you want to be familiar with Joyent, you should be familiar with OpenSolaris. Accelerators has an automatic CPU bursting feature so you can usually expect higher CPU power than is the minimum according to the the plan you prepaid. Another good news is that all pricing include 10TB of data transfer per month per customer. The bad news is that you have to pay an extra charge for using CDN starting at 50GB/month for $25. It seems to me that it is Joyent's business strategy to get paid for these "extra things" (i.e. additional IP address available at $60/year). Support (pretty poor knowledge base and no 24/7 help) is not as good as you can expect if you choose Rackspace or ReliaCloud. For further details see the complete pricing.
  • Cheapest plan: $25/month: 256MB RAM, 1/16 core (can burst to 1,5 core), 5GB local storage; there are no hour fees so you have to pay for at least one month
  • Shared storage $0.15/GB/month
  • CDN starting at $25/50GB transfer
  • Have to be familiar with Solaris sytem
  • Average support
  • Featured customers: LinkedIn, Gilt Groupe

So this is a rough overview of picked cloud on-demand server services. Everyone has its pros and cons. Amazon is market leading company, on the other hand Rackspace is innovating quickly and has great and free customer support. ReliaCloud is a bit poor in a variability of its offer but they also have good support and prices aren't bad for what you get. Joyent looks good on the first view, but you realize that you have to pay a lot of chips for things that the others have for free or for marginal price.

I Hope this post helped you to decide which Cloud solution should you start to experiment with. But I encourage you to try all of them, because it's good to put hands on every option on the market and than decide which one suits you best. According to the prices mentioned above, you can try all mentioned cloud providers for just few hours and you won't pay more than few bucks. So which one will you start with? Amazon, Rackspace, ReliaCloud or Joyent?

If you have some experience with using these on-demand servers, please feel free to leave a comment.

Thursday, May 6, 2010

Rackspace Cloud Android App Released

Rackspace made its new app for managing their cloud hosting service available through the Android Market. So you can administer your cloud with your android phone. According to the Rackspace blog post, with this app it's possible to:

Cool thing is that the code of the application is published as open source so you can download it or even contribute to this project. See the project's github.

This just first release so I'm looking forward to further development and enhancements.

Sunday, May 2, 2010

If U Wanna Grow, Learn To Say NoSQL

In this post i will try to make a rough overview of wtf is the NoSQL database and why and when to use it.

Well, we all know how cool and easy it is when programming even the simple application to have data of the application's model stored in a relational SQL database like MySQL or PostgreSQL. It's easy to acces with the well known SQL language, vast majority of developers is familiar with principles of building such a dabtabase like SQL normal forms, there are plenty developed opensource libraries with pretty good documentation and community etc etc.

So why to invent or even try something new, when we have this established relational database ecosystem? Because in last few years it has showed up, that relational databases cannot scale properly due their complicated relations between data, that they handle. So what was considered as a great advantage of relational databases in their decades of fame has became disadvantage. It's because today's vast internet applications like Facebook, Google or Twitter wasn't able to handle their bazillion pieces of data in a real-time. Firstly it was partially solved by denormalizations of their relational databases models, but it wasn't as effective as they've expected. The database related infrastructure necessary to satisfy their needs became very expensive and without desirable impact on a speed of the application. So with a NoSQL database you have to give up some of the relational database properties like ACID or consistency guarantee but on the other hand you have build-in partitioning, load balancing, transparent replication, great scaling possibilities with the ability to add capacity without without any influence or impact on applications running against the database.

I don't want to say that SQL databases are bad or generally unusable. Not at all. I think that SQL related technology is great and will do its job perfectly for most of your projects. I just want to say, that if you are planning to be really (really) BIG, you should consider using other technology than SQL.

So here comes NoSQL

First of all NoSQL is a big paradigm shift in modeling data and it can be pretty confusing at the beginning. Forget all these JOINs, GROUP BYs, foreign keys, rigid schemes, consistency guarantees. The basic concept of all NoSQL databases is to store only key-value pairs. Tables and designs are replaced by document oriented storage (see the picture below). Those of you familiar with JSON should cope with that very quickly. The principle of NoSQL is to keep all pieces of the related informations together so that it's easy fetch them. So as a developer you have to model such a database with a "queries" that will be neede by you application already in your mind. In other words you have to know, what kind of operations above the dataset will you need. Yes, this approach has higher requirements on the developers abilities and yes it's not easy to get familiar with this concept, but give it a time.

Examples of NoSQL databases

NoSQL are being developed to effectively handle any amount of data. Their development is kinda fresh. It means that they are optimized for modern concurrency driven environments and clouds in. An examples we can mention MongoDB, CouchDB. Nice example of a connection between the database world and the cloud world can support of Apache Cassandra project (formerly facebook proprietary code) from the side of Rackspace company running cloud server hosting services.

I'm planning some more in-depth views in to the world of NoSQL in a form of examples and tutorials. So if you are interested in this topic, stay tuned.