Reputation: 73
Hi I want to migrate data from my cassandra cluster to another cassandra cluster. I have seen many posts suggesting various methods but are not very clear or have limitations. The methods seen are as follows:
Any help would be appreciated.
Upvotes: 1
Views: 7121
Reputation: 410
If you are able to set up the target cluster with exactly the same topology as the source cluster, the fastest way may be to simply copy the data files from the source to the target cluster, since this avoids the overhead of processing the data to redistribute it to different nodes. In order for this to work, your target cluster must have the same number of nodes, the same rack configuration, and even the same tokens assigned to each node.
To get the tokens for a source node, you can run nodetool info -T | grep Token | awk '{print $3}' | tr '\n' , | sed 's/,$/\n/'
. You can then copy the comma-separated list of tokens from the output and paste it into the initial_token
setting in your target node's cassandra.yaml. Once you start the node, check its tokens using nodetool info -T
to verify that it has the correct tokens. Repeat these steps for each node in the target cluster.
Once you have all of your target nodes set up with exactly the same tokens, DC, and racks as the source cluster, take a snapshot of the desired tables on the source cluster and copy the snapshots to the corresponding node's data directories on the target cluster. DataStax OpsCenter can automate the process of backing up and restoring data and will use direct copying for clusters with the same topology. It appears that medusa can do this too though I have not used this tool before.
Upvotes: 1
Reputation: 87069
It's really a question that requires more information to provide definitive answer. For example, do you need to keep the metadata, such as, WriteTime and TTLs on data, or not? Does the destination cluster has the same topology (number of nodes, token allocation, etc.).
Basically, you have following options:
sstableloader
- tool shipped with Cassandra itself that is used for restoring from backups, etc. To perform data migration you need to create a snapshot of the table to load (using nodetool snapshot
), and run sstableloader
on that snapshot. Main advantage is that it will keep metadata (TTL/WriteTime). Main disadvantage is that you need to perform taking snapshot/loading on all nodes of the source cluster, and you need to have exactly the same schema and partitioner in the destination cluster;cqlsh
's COPY
command, it's heavily optimized for loading/unloading of big amounts of data. It works with Cassandra 2.1+ and most DSE versions (except ancient ones). Upvotes: 6