Couchbase Dev Day

couchbase

 

I got down and dirty with Couchbase at their developer day in Boston. Beyond some headaches related to pushing data into TimesTen I had no experience with a cache server let alone NoSQL. Obviously a good starting place to play around with Couchbase!

Several parts of the Couchbase architecture really interested me:

  1. JSON Object Storage
  2. Memcached Layer
  3. Server failover

JSON

Couchbase is a key-value document storage server meaning you have your id and a complete object model. This completely steps around the painful integration layer generally found with DAOs and Callbacks etc. Now instead of querying a view or executing a SQL statement and mapping the return array we pull back a JSON object already in our desired format.

Another part of the JSON love may have originated from a little Java bashing from Jasdeep, after which I got ruby setup and tried it the easy way. Well.. OK setting up ruby and couchbase is a bit of a pain unless the steps documented on couchbaseonrails.com as libcouchbase seems to trip everyone up.

Anyway with ruby up and running along with a sample bucket I have to admit there was something nice about having only several lines to connect and pull a complete model into the script and have it ready to be manipulated:

require 'rubygems'
require 'couchbase'
 
client = Couchbase.connect(:bucket => "beer-sample", :hostname => "localhost")
beer = client.get("aass_brewery-juleol")

Finally one of the major parts of the method Couchbase use that is a real mind fu*k for RDMS developers is the fact that the documents do not have to be identical. For example you might have a standard Car JSON document – by standard I mean you have a standard naming convention for the document id. Within these documents you would have elements for say manufacturer, class, transmission etc.We, java developers, would tend to bring in inheritance to separate out maybe sports cars, saloon, and hatchback.

Within the JSON document though we simply include the elements that are applicable and let trust the application to simply pull where type=”sportscar” for instance. The sports car document contains all the standard car elements plus our sports car specific elements but there is no frivolous saloon elements idling empty.

Memcached

I have struggled over explain plans to eek out the last drop of performance. I just came from a class where it was discussed modifying the packet size and frequency with Oracle*Net – a slightly worrisome level of configuration.

Performance really looks like the driving force behind Couchbase’s architecture. All queries first use the server look-up service to figure which server holds the golden copy of the document and then checks that server’s memcachd layer, it’s in-memory storage, before descending to the database itself.

From the cache layer data is replicated to the other servers in the cluster via one queue and another queue passes changes down to the underlying database. Jasdeep and John discussed that it would only be the latest version that was replicated and saved to the database in the circumstances that there were a lot of writes.

This was one of the drawbacks in the cases where we would have to try and log every state objects went through. Though maybe we could increase logging at the application layer and rely primarily on it.

Server Fail-over

You can’t love distributed systems without loving their recovery process! Couchbase lets you determine the number of servers that you want each document replicated to. Personally I do not want my document replicated to every instance in the cluster – that type of duplication is just eating away at my resources and performance.When a sever in the cluster does eventually fall over then the rest of the cluster is re-balanced and the sever maps updated to the new location of each document.

Still…

Still my main gripe is the inability of load balancing a single document. There is only one golden copy ever. Yet major common documents that receive updates multiple times will result in that server being bush-waked. There are certainly ways around it – maintaining multiple copies of the document and having one document linking to the other ids springs to mind. Yet it would be nice to dynamically bring up two copies and maybe someone prioritizing the replication between the cache layer for these documents..

A proper solution no doubt involves reviewing the application architecture and offsetting the  need to call the common document as much as possible. Maybe someone has a post on a good pattern to use – a google for another day I guess.