avatar

Data Architect and DataOps Engineer

Cloning Cassandra Clusters The Fast Way

Cloning Cassandra clusters is not something that is talked about in the community a whole lot. Do a quick google search on “Clone Cassandra” and there aren’t many hits. Mostly just the DataStax docs and a few StackOverflow questions. Albeit, it’s just a practice in backup and restore, but it’s still a valid edge use case. The good news is, if you’re taking backups, you’re halfway there. This post will walk you through the steps to clone your cassandra data the fast way.

Sizing Cassandra Data

Size does matter. One of the basic tasks of a Data Modeler is to know how big your data is going to get. You need to provide estimates of how much disk your model will consume. How fast is it coming? Another basic task is to know how fast your data is flowing inbound. Figuring out how quickly, or slowly, it will grow is important for capacity planning.

Cassandra Physical Data Modeling

The task of a data modeler is to create order out of chaos without excessively distorting the truth. The finished product should be a data model that describes the structure, manipulation and integrity aspects of the data to be stored. To properly create a data model, the modeler will transform said chaos through three distinct stages. The first is a Conceptual Data Model, then a Logical Data Model, and lastly, a Physical Data Model.

Cassandra Partitioning & Clustering Keys Explained

Primary and Clustering Keys should be one of the very first things you learn about when modeling Cassandra data. Most people coming from a relational background automatically think, “Yeah, I know what a Primary Key is”, and gloss right over it. Because of this, there always seems to be a lot of confusion around the topic of Primary Keys in Cassandra. With this post, I will try to demystify the confusion.

Cassandra Logical Data Modeling

The task of a data modeler is to create order out of chaos without excessively distorting the truth. The finished product should be a data model that describes the structure, manipulation and integrity aspects of the data to be stored. To properly create a data model, the modeler will transform said chaos through three distinct stages. The first is a Conceptual Data Model, then a Logical Data Model, and lastly, a Physical Data Model.

Benefits of choosing Cassandra for your data

Choosing the right tool for a project can sometimes be easy. For example, to drive a nail, the obvious choice would be a hammer. Other times, the choice is not as easy or obvious. A data engineer, like a good carpenter, needs to know that different tasks require different tools. Selecting which database will hold your data has many factors to consider before a decision is made. Apache Cassandra is one of the many tools that a data engineer should know how and when to use.

Cassandra Conceptual Data Modeling

The task of a data modeler is to create order out of chaos without excessively distorting the truth. The finished product should be a data model that describes the structure, manipulation and integrity aspects of the data to be stored. To properly create a data model, the modeler will transform said chaos through three distinct stages. The first is a Conceptual Data Model, then a Logical Data Model, and lastly, a Physical Data Model.

Update your Cassandra data model for 3.0

Apache Cassandra version 3.0 was released in September 2015. With the 3.0 release, some major updates to the storage engine and (long awaited) materialized views were introduced. For a Data Modeler, the addition of views are something to jump up and down about. However, over the last half year or so, many companies have been hesitant to make the jump from their trusted 2.1.x version. By the end of this post, I hope to help you understand:

Cassandra Data Model Basics

Data. Model. What an odd combination of words. To someone outside of the world of I.T., it doesn’t make much sense. But to those people that have written software or attempted to store information, it’s a phrase all too often discussed. A data model, as defined by wiki, “is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world”.

Restoring Cassandra Data

If you’ve ever lost your production data or discovered that it had been corrupted, then you know the importance of being able to restore that data. Restoring your Cassandra data can be quick and painless. The following three methods will help you to gain a better understanding of what’s involved. Before you can begin to restore data though you have to have data to restore. You are backing up your data, right?