Turbo
Turbo

Reputation: 2560

How to migrate data from Cassandra cluster of size N to a different cluster of size N+/-M

I'm trying to figure out how to migrate data from one cassandra cluster, to another cassandra cluster of a different ring size...say from a 5 node cluster to a 7 node cluster.

I started looking at sstable2json, since it creates a json file for the SSTable on that specific cassandra node. My thought was to do this for a column family on each node in the ring. So on a 5 node ring, this would give me 5 json files, one file for the data stored on in the column family that resides on each node.

Then I'd merge the json files into one file, and use json2sstable to import into a new cluster, of size, lets say 7. I was hoping that cassandra would then replicate/balance the data out evenly across the nodes in the ring, but I just read that SSTables are immutable once written. So if I did what I just mentioned, I'd end up with a ring with all the data in my column family on one node.

So can anyone help me figure out the process for migrating data from one cluster to a different cluster of a different ring size?

Upvotes: 7

Views: 9532

Answers (4)

John
John

Reputation: 67

You may do some steps as following: 1. Join 7 nodes into 5 nodes clusters and set up each node with its own ring token. At this time, you may have a cluster with 12 nodes. 2. Remove 5 nodes from new cluster in step 1. 3. Set up the token ring for each node after moving 5 nodes in your own. 4. Repairing the 7 nodes cluster.

Upvotes: 0

jbellis
jbellis

Reputation: 19377

Better: use bin/sstableloader on the sstables from the old ring, to stream to the new one.

Normally sstableloader is used in a sequence like this:

  1. Create sstables locally using SSTableWriter
  2. Use sstableloader to stream the data in the sstables to the right nodes (bin/sstableloader path-to-directory-full-of-sstables). The directory name is assumed to be the keyspace, which will be the case if you point it at an existing Cassandra data directory.

Since you're looking to stream data from an existing cluster A to a new cluter B, you can skip straight to running sstableloader against the data on each node in cluster A.

More details on using sstableloader in this blog post.

Upvotes: 9

Zanson
Zanson

Reputation: 4031

You don't need to use sstable2json. If you have the space you can:

  1. get all the sstables from all of the nodes on the old ring
  2. put them all together on each of the new servers (renaming any which have the same names)
  3. run nodetool cleanup on each node in the new ring and they will throw away the data that doesn't belong to them.

Upvotes: 0

sdolgy
sdolgy

Reputation: 7001

I would venture to say that this isn't as big of a problem as it may seem.

  1. Create your new ring and define the tokens for each node appropriately as per http://wiki.apache.org/cassandra/Operations#Token_selection
  2. Import data into the new ring.
  3. The ring will balance itself based on the tokens you have defined http://wiki.apache.org/cassandra/Operations#Import_.2BAC8_export

Upvotes: -1

Related Questions