Simple Cassandra Instance In AWS EC2

I’ve been doing Cassandra development in an enterprise setting for quite a while now.  And being a good little development shop (Expedia), we had Chef set up to deploy stuff for us, including Cassandra installation.  Chef is great at doing just that.  The downside to that is that you forgot the details of how to install Cassandra manually.  Here are the steps to spin up an EC2 instance and getting a single instance of Cassandra running on it.

AWS setup

First of all, you must have an Amazon Web Services (AWS) account to do this. Just do it, you’ll thank me once you see all the cool things you can do with it. Once you’re signed up, log into the AWS console and go to the EC2 dashboard.  Click on the big “Launch Instance” button (I know, they make it easy, huh).  My personal preference is CentOS, but you can pick whatever flavor of Unix you like.  I typically go to the AMI Marketplace over the Community AMI area.  I know I can trust the marketplace, whereas the other, I can’t trust to not have extra bloat pre-installed.  From the Marketplace, I select the latest version of CentOS.  In this case, CentOS 6.5.

The next step is to select your instance size.  This decision can affect your wallet greatly.  AWS offers a free tier for new users to have a micro instance running non-stop for free for the first year.  Now, since I’m way past the first year on my AWS usage, I’ll only pay a couple cents per hour for use.  But if you’re not careful, you can rack up a pretty good sized bill.  I went with a m1.small instance.

Connect to EC2

Once you have your Linux VM up and running in EC2, you’ll need to connect to it.  If you’re on a Mac, you can simply “ssh” into it.  If you are on a PC, then you’ll need to download a third-party tool to connect.  I like using “cmder”, available here.  It’s probably the most similar to “ssh-ing” from a Mac.  The following is my connection.  Note, I have a pem file that I’m passing to it.  This is the public/private key that you can set up.  Highly advise using this method to get in and out of EC2 instances.

ssh -i c:\adamhutson.pem

Java setup

First thing, does your VM have Java 7 installed on it?  Run the following to find out:

java -version

If it’s not installed, as is the case with most vanilla CentOS images, you’ll receive the following response:

-bash: java: command not found

So we’ll need to download the JDK for Java 7 and install it.  Run the following:

cd /opt/

curl -b oraclelicense=accept-securebackup-cookie -O -L

tar xzf jdk-7u67-linux-x64.tar.gz

cd /opt/jdk1.7.0_67/

So now, we’ve downloaded and unzipped the JDK to the cd/opt/ folder, and then navigated into the new JDK folder.

alternatives --install /usr/bin/java java /opt/jdk1.7.0_67/bin/java 2

alternatives --config java

alternatives --install /usr/bin/jar jar /opt/jdk1.7.0_67/bin/jar 2

alternatives --install /usr/bin/javac javac /opt/jdk1.7.0_67/bin/javac 2

alternatives --set jar /opt/jdk1.7.0_67/bin/jar

alternatives --set javac /opt/jdk1.7.0_67/bin/javac

Let’s re-check the version of Java now that we have the JDK 7 installed:

java -version

Should receive the following now:

java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

Now we need to set up some environment variables:

export JAVA_HOME=/opt/jdk1.7.0_67

export JRE_HOME=/opt/jdk1.7.0_67/jre

export PATH=$PATH:/opt/jdk1.7.0_67/bin:/opt/jdk1.7.0_67/jre/bin

Cassandra install

We need  a place to store our Cassandra files that we will be downloading.  I choose to have a generic “cassandra” folder in the root.

mkdir /cassandra

cd /cassandra

There are two different ways to run Cassandra.  The first is to download the tar and run it manually.  The other is to get an rpm and install it as a service.

Option 1: Manually

curl -OL

tar xzf apache-cassandra-2.1.0-bin.tar.gz

cd apache-cassandra-2.1.0

bin/cassandra -f

Note the -f option that we passed the Cassandra binary. This tell Cassandra to run and output messages to the foreground.  This is really nice to see what is going on.  You can also leave that -f option off and not see any output.  With it, you should receive a ton of output messages.  However, it will stop outputting and the last couple lines should be like the following: (Note the last line that it is “Listening for thrift clients”)

INFO 14:33:44 Starting listening for CQL clients on localhost/
INFO 14:33:44 Binding thrift service to localhost/
INFO 14:33:44 Listening for thrift clients...

Now we have Cassandra running in the foreground, we’ll need to open a new tab and ssh into the box again. If we hadn’t used that -f, you could have reused the same ssh connection as above.

cd /cassandra/apache-cassandra-2.1.0


We are now connected to that running instance of Cassandra via the cqlsh interface.  The following is the first part of the response for the cqlsh command line.

Connected to Test Cluster at
[cqlsh 5.0.1 | Cassandra 2.1.0 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.

To stop this installation of Cassandra from running, it’s a two-step procedure.  First we grep to get the pid that it’s running under and then kill the pid.

ps aux | grep cassandra

kill 1234

Option 2: As a service

curl -OL /noarch/cassandra21-2.1.0-1.noarch.rpm

rpm -i cassandra21-2.1.0-1.noarch.rpm

Should receive the following, letting you know that the service has started.

service cassandra start
Starting Cassandra: OK

You can also query the service to see if it’s running:

service cassandra status

Should receive the following, with a different pid, of course:

cassandra (pid 24356) is running...

Let’s connect to it to see if it responds:

cd /usr/bin


Should receive the following to know that you are now inside the cqlsh prompt interact with Cassandra data models:

Connected to Test Cluster at
[cqlsh 5.0.1 | Cassandra 2.1.0 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.

It’s a service, so the normal service command are what you will use to stop it.

service cassandra stop

Start using Cassandra

That’s it.  Now go on your merry way and spin up your own.

House Of The Rising Scrum

Yes, I’m having fun with my blog titles twisting classic song titles. Get used to it, I think I’ll keep doing it.

Fire Fighting

After a couple of weeks at the new job, I noticed that there wasn’t really any direction to how issues were being addressed. It seemed like whatever the current fire was or whoever was pestering the loudest was getting their issue addressed first. This also meant that current issues being worked were being dropped for the latest “emergency” issue. Not exactly an ideal way to develop software (or build anything, for that matter).  It felt like being a fire fighter, just answering the call for latest fire that needed putting out.

Almost immediately, I started pushing to change the team dynamics. I wanted my team to use a Scrum board to track their issues. Also, I wanted them to have quick stand-up meetings every day to let each other know where they were with each issue. A few on the team had tried this approach before (they had a previous project that used this method). Of those users, half of them hated it and the other half loved it. However, all agreed that the project implementation was a failure.

With a little persuading, I was able to get them to try the process again. Now, introducing a development team to the Agile process can be a very polarizing activity. People either love it or they hate it. Rarely are there any “meh” attitudes towards using Agile. With this team, I already had a split decision on members buying in to it. So, I decided to take it in baby steps.

Scrum Task Board

The first step is introducing the team to a Scrum Task board. What is a Scrum Task board, you ask?  In its simplest form, it is a grid of rows for short-term tasks using columns to show progress towards completion.  But in reality, it’s a quick visual display for the team to see where they fall with their current issues.  It helps to figure out what’s issues are done, what issue you’re currently working on and what issue is next.  Below is a very simple example (image borrowed from here):

A generic scrum task board

Choosing a medium for your scrum task board is up to you and your team.  Firs, you need to find a common area that everyone can easily see.  You don’t want to create your task board in an far-flung area or a low-traffic area.  It should be out in the open so that everyone, not just your team, can see what’s being worked on.  Next, you need to decide what medium you’ll use for your tasks.  The 2 most common that I’m aware of are Post-It notes and  3×5 index cards using thumb-tacks or magnets.  While making this decision, you’ll need to decide what surface you’ll display your board on.  This could be a window, wall, side of a filing cabinet, whiteboard … creativity is your only limitation.

My personal choice for Scrum task board is to use Post-Its on a portable whiteboard with 1/4″ tape to draw out the grid.

Daily Stand-up Meeting

The second step is scheduling the team to have a 15 minute stand-up meeting every morning.  The purpose of this meeting is to tell each other what you completed yesterday, what you’re working on today and what, if any, blockers you currently have.  This allows the team members to stay abreast of each other’s issues and adds accountability to each team member.  Below is an example of a stand up at the task board (image borrowed from here):


With an Agile development methodology, work is broken down into sprints.  A Sprint is a predefined length of time to accomplish a set number of issues.  Now, I didn’t want my team to get too bogged down in all the aspects of Agile just yet.  I had loosely used the word “Sprint” a couple of times and noticed that I had a few puzzled stares.  Without going into too much detail, I told them that the amount of time that we allotted to our scrum board was called a Sprint.  And that that length of time was up to us, the team, to decide.

Early Adoption

From my experience, a sprint is ideally 2-3 weeks long.  However, I wanted my team to get the feel for the complete cycle of work quickly.  I decided to have the team do a quick 5 day sprint.  It was a Friday morning and we were deciding the issues that we would work on.  So it made sense to have the following week, Monday through Friday, be our first official Sprint.  Remember, some of the team had ill feeling towards the Scrum process.  So I wanted to get some early positive points up for hopefully a long-standing Agile process.

Sprint 1: 5 Days

Because of existing projects, this first sprint, there would be only three of us available to work on issues.  Luckily, myself and the two other developers were already fans of Agile.  So hopefully, we can show a positive, unified presence of a successful scrum for the other members to see.  We accepted 13 issues into the sprint.  At the end of it, we were able to deliver 8 issues to our production environment.  The remaining 5 issues were descoped due to external blockers or scope creep.  Since we don’t have a formal Product Owner monitoring a product backlog, we are relying on an issue tracking system which allows the reporter to alter the problem and scope at any time.  We’ll need to work on that aspect in the next sprint and only accept issues that are more stable.

Sprint 2: 10 Days

After the first successful sprint, we lengthened the sprint days to a full two weeks, or 10 days.  Still it’s only three of us that are available to work, but so far, the two others are feeling more confident about their work and what’s expected.  We have accepted 10 issues into the sprint this time.  With a longer sprint, we felt we could deliver a couple of more lengthy, time-consuming issues.  Hopefully, this sprint will also be a success.

Things To Come

Once the team is comfortable with our board and stand-ups, I’ll introduce them to retrospective meetings, planning poker, and hopefully find some more team members to add to the sprint process.