Friday, April 4, 2014

Cassandra, Where is My Data?

Last time I shared some java code that I have whittled together to interact with Cassandra.  Now I want to learn a little bit more about what a cassandra cluster does with data.

As I have been playing with this project I began to realize something:  one of the issues my real life project has is:  having the business application deal with some operational details like maintaining a disaster recovery system.  To separate this detail from the business application, I am expecting that Cassandra will deal with putting copies of my data on other cluster members so that DR maintenance "just happens".

The sample code for this post and the companion posts is on github at https://github.com/fwelland/CassandraStatementTools.

Having realized this was really my goal, I wanted to make sure that my data was replicating to cover my DR needs.  So here is my naive mental image of what I thought was happening while loading some statements:


So How Do I Test This To Be The Case?

So first here are the arguments for a load operation for CLI program using the code of the earlier post.

String margs[] = new String[]{"-node", "127.0.0.1",  
                              "-date", "2014-02-27", 
                              "-customerid", "4799",
                              "-statementtype", "9700", 
                              "-file", "/home/fwelland/Downloads/pdf-sample.pdf"};        

I hope based on earlier content this is pretty self explanatory.   Why the string array?  Oh, long story, but using Netbeans and Gradle; there isn't a good or convenient way (that I know of) to pass in CLI args to the "run" feature of Netbeans & G4NB; so I just hammered in a silly string array for some quick testing.

So to see that my record is 'there', I used DevCenter and the "MyTestCluster" connection profile that I had set up previously.    Recall this connection profile has all three nodes listed and DevCenter probably just connects to the first 'up' node it finds.    A simple select * from statementarchive.statements; revealed a single record.   Good; that is what I was expecting.

Staying within DevCenter, let me stop a node and see what happens.   I shutdown a node via CCM and the command:

/opt/ccm-master/ccm node1 stop

And then back in DevCenter to reissue the query:



Not what I was expecting.  Ok, maybe if I connected to one of the nodes directly, I can select my record.   So I did this to get an inventory of the specs of each node in the cluster:

$for idx in 1 2 3
> do
> /opt/ccm-master/ccm node${idx} show
> done

I got the following:

node1: DOWN
       cluster=MyTestCluster
       auto_bootstrap=False
       thrift=('127.0.0.1', 9160)
       binary=('127.0.0.1', 9042)
       storage=('127.0.0.1', 7000)
       jmx_port=7100
       remote_debug_port=0
       initial_token=-9223372036854775808
node2: UP
       cluster=MyTestCluster
       auto_bootstrap=False
       thrift=('127.0.0.2', 9160)
       binary=('127.0.0.2', 9042)
       storage=('127.0.0.2', 7000)
       jmx_port=7200
       remote_debug_port=0
       initial_token=-3074457345618258603
       pid=6035
node3: UP
       cluster=MyTestCluster
       auto_bootstrap=False
       thrift=('127.0.0.3', 9160)
       binary=('127.0.0.3', 9042)
       storage=('127.0.0.3', 7000)
       jmx_port=7300
       remote_debug_port=0
       initial_token=3074457345618258602
       pid=15166

Using this information, I added 'direct' connections in DevCenter to each of the three nodes, like this:


I tried again, but connecting to a specific node gave same result.  I mused that DevCenter may be 'doing something' and I should drop back to a csqlsh; it provided the same results. 

How Do I Find Out Where My Record Is? 

Puzzled, I wanted to know exactly which node had my record and why I couldn't seem to fetch it given my mental image from above.  After some googling, I learned a little about nodetool and the command getendpoints.  Grabbing the record guid from csqlsh I came up with this:

$/opt/apache-cassandra-2.0.5/bin/nodetool -p 7100 getendpoints statementarchive statements  04b51d75-8a18-4eae-b669-c89bce92b6d7

This command will "print the end points that owns the key".  I assumed 'endpoint' referred to node; not quite sure what "owns the key" means.  I assumed that too mean 'has the record'.  This command suggested the key was on 127.0.0.2 or node2.      

Node2?  That doesn't fit my mental image above!  And if node2 has my record, why couldn't I get it when when node1 was down?

Replicas & Replication Factor

The first 'wrong turn' I made was how I created my key space and table and the mental image from above.   Recall my key space creation:

create KEYSPACE  statementarchive 
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

In Cassandra speak a copy of a record is called a replica.  The replication factor setting of 1 translates to there is only a single replica of the record in the cluster.   This description in the DataStax documentation gives a pretty simple run-down of replicas and replication factors; go read it!.

To "correct this problem" I used DevCenter connected to MyTestCluser to execute the following:

DROP KEYSPACE  statementarchive;

DROP TABLE statementarchive.statements;

create KEYSPACE  statementarchive 
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

CREATE TABLE statementarchive.statements (         
    archived_statement_id uuid,         
    customer_id int,         
    statement_type text,         
    statement_filename text,         
    year int,         
    month int,         
    day int,         
    statement blob,         
    primary key (archived_statement_id));

I then ran the same statement load action mentioned above and checked with a simple select that I did have one record loaded.    Lets see what node tool says.

$/opt/apache-cassandra-2.0.5/bin/nodetool -p 7100 getendpoints statementarchive statements  0150eea3-7977-445f-acac-22e5887ef8d8 
127.0.0.2
127.0.0.3
127.0.0.1

Cool!  The record seems like it is on all three nodes now.    No surprises: with all three nodes up, I can see my record while connecting to each node directly.    After shutting down node1 and trying again, I could CQL out my record from the cluster and directly from node2 and node3!

So now the mental model of the write operation from above seems to be in-sync with what the cluster is doing.  But there are some lingering questions I have.  Why did my record go to node2 when I connected to node2 to load a statement?    Why couldn't I get the record from node2 or node3 when node1 was down?

In future posts I will try to learn more so I can answer those questions.  Further, I will review more about the CLI I built.

No comments:

Post a Comment