Interfacing with Apache Cassandra 0.8 in Java.


Apache Cassandra is a NoSQL solution that was developed by Facebook to power their Inbox Search System. It is used by prominent companies such as Reddit, Twitter, Rackspace, Digg, to name but a few.

Cassandra's data model is based on column families (which is indexed), with its basic unit of storage, called a column. A column is a name-value store, which are grouped to a column family. You can have indefinite amounts of columns in a column family. Every column family's key must be unique. I don't want to go into details explaining column families so I'll rather post links that you will need to read before venturing further into this blog. There's an interesting article describing the Cassandra data model. Check out WTF is a SuperColumn? An Intro into the Cassandra Data Model.

This is a simple introduction to Cassandra 0.8 as there has major changes from version 0.6 to the current version.

 

Prerequisites.

Cassandra is written in Java, so you will need to download the latest Java in order to run Cassandra. I'm running JDK 1.6.0_25, but any JDK 5 and higher can do.

Downloading and Unzipping Apache Cassandra.

To begin, we will need to download Apache Cassandra from the Apache Cassandra Download Page (I've downloaded the latest Cassandra, version 0.8.2, apache-cassandra-0.8.2-bin.tar.gz). Unzip the archive to the root directory (preferable for the Windows OS users) or a directory of your choice (I'm using Windows 7 Ultimate, and I've unzipped the archive to c:\apache-cassandra-0.8.2\ folder).


Before we continue, we need to setup 2 important environment variables: CASSANDRA_HOME and JAVA_HOME. CASSANDRA_HOME must point to your Cassandra directory (sans specifying the bin folder) and JAVA_HOME needs to point to your java directory (You should know this by now, if you're a java developer! .NET, there's Google! :p)


Now that we're set, let's have some fun!

Setting up and running Apache Cassandra.

Prior to Apache Cassandra 0.7, you had a storage configuration file called storage-conf.xml in your CASSANDRA_HOME/conf folder (if memory serves me correctly). This doesn't apply from Apache Cassandra 0.7 and higher. As of 0.7, the storage configuration file is described in CASSANDRA_HOME/conf/cassandra.yaml file. For more information on storage configuration in Cassandra, visit the Apache Cassandra Wiki StorageConfiguration.

To run Cassandra, go to CASSANDRA_HOME/bin folder and type the following command:

cassandra -f

(the -f option is to tell Cassandra to run on foreground as a non-daemon process). 

If you want to record the cassandra process id to a file, simply use the -p variable, e.g. cassandra -p /var/cassandra.pid

To see if you have successfully started Cassandra, a message will be displayed on your bash/shell window/Command prompt like so (forget the date/time stamp):

INFO [Thread-4] 2011-08-03 12:33:48,880 CassandraDaemon.java (line 145) Listening for thrift clients...

If you see the bolded text ("Listening for thrift clients...") then you should start smiling. :-)

Now, we need to see if Cassandra is truly running. Start Cassandra-CLI (Command Line Interpreter) in another shell/command prompt. In CASSANDRA_HOME/bin folder type

cassandra-cli

The following output should be similar to the one I have below:

Starting Cassandra Client
Welcome to the Cassandra CLI.


Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.


[default@unknown]

Once you see the text [default@unknown] on your screen, type the following command (In Cassandra CLI, every Cassandra command is terminated with a semi-colon ;, else it'll display an ellipsis, ... waiting for a semi-colon):

connect localhost/9160; 

A successful response to the above command will be (in this effect)Connected to: "Test Cluster" on localhost/9160

Another way to connect to Cassandra with Cassandra CLI is to pass the connection parameters when calling cassandra-cli, as follows:

cassandra-cli -host localhost -port 9160

(Cassandra default port is 9160). The result is exactly as above.

Now, let's see if we Cassandra is truly listening (for thrift clients). In Cassandra CLI, type 

show keyspaces;

If you see a list of keyspaces (default keyspace is "system") then Cassandra is up and running.

Type help; on the CLI to see a list of available Cassandra CLI commands. :-)

 

Java Examples: Using Apache Thrift.

For this demonstration, I am using Thrift 0.6.1 (latest, at the time of writing) from Apache Thrift.

Cassandra now expect java.nio.ByteBuffer whereas it was (in Cassandra 0.6 and lower), it used byte arrays. I suggest that you always refer to the Cassandra API Wiki when you're interfacing Cassandra with Thrift.

I will demonstrate how to create a new keyspace, "Keyspace1" and create a column family called "Authors" using Apache Thrift in java. Note: Prior to Cassandra 0.7, a keyspace definitions, with all its column families declarations were written in storage-conf.xml. This doesn't apply anymore: You will have to write code to create your keyspace and column families definitions.

My keyspace and Authors column family is pictured as follows:


And here's the full demonstration code on how I achieved this: I've posted comments for easy understanding :-)




As you can see, this code throws an UnavailableException. The Cassandra API simply gives a one liner explanation of what UnavailableException means:


Not all the replicas required could be created and/or read.

This is simple: I am using a SimpleStrategy and SimpleStrategy requires a replication_factor to be set. For NetworkTopologyStrategy, you will have to specify each data centre and the replication number on the strategy_options (More information can be found here). Now, we need to add a replication factor, since Cassandra 0.8 doesn't have the integer field for replication_factor anymore, we need to add this field onto the Map<String, String> strategy_options.

The following code, below, shows how:



And now, remove, this code keyspaceDefinition.setReplication_factorIsSet(false);, and we're good to go.

The final code below throws no exceptions.



Hooray, it works!!! :-)

I will continue with a tutorial on how to connect with Hector (a Java Cassandra Client). Hector comes with wonderful features, such as Connection Pooling for Cassandra, JMX Support, etc. For more information on Hector, head over to the Hector site.


Have fun!

PS: A related StackOverflow question can be useful too, I hope. :-)

Comments

  1. Good Post. In addition to this, there is one more thing to know that if we use NetworkTopology then we have to set data center option like this:

    keyspaceDefinition.strategy_options.put("datacenter1", "1");

    ReplyDelete
  2. The most effective method to Solve Apache Cassandra 0.8.2 Issue through Cassandra Technical Support
    In the event that you download 0.8.2 rendition of Cassandra yet it appears there is having various of changes made in this form and some specialized hiccups at that point straightforwardly influence an immediate association with Apache Cassandra Support or Cassandra Customer Service to get the snappy arrangement. Rapidly get to all the important data which you have to accomplish ideal Cassandra Database execution with Cassandra Database Consulting and Support.
    For More Info: https://cognegicsystems.com/
    Contact Number: 1-800-450-8670
    Email Address- info@cognegicsystems.com
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

    ReplyDelete

Post a Comment