Sunil Kumar
Sunil Kumar

Reputation: 1349

Keyspace schema import and export in Cassandra

I have a Cassandra 1.1.2 installation on my system as a single node cluster and have three keyspaces: hotel, student and employee. I want to dump the keyspace schema of hotel along with its column family data if possible and restore the dump on other Cassandra cluster. Can any one suggest me in detail that how should I do this?

Upvotes: 7

Views: 10981

Answers (2)

user3360277
user3360277

Reputation: 21

I don't recommend use stable2json and json2sstable to load a large amout of data. It uses jackson API to create the dataset and transform it to json format. It implies to load all of the data in memory to create a unique json representation.

It is ok for a few amount of data, now imagine to load a large dataset of more than 40 million of rows, about 25GB of data, these tools simply doesn't work well. I already asked datastax guys about it without clarification.

In case of large datasets, just copy cassandra data files from a cluster to another may solve the problem. In my case I'm was trying to migrate from Cassandra 1.0.6 cluster to a 1.2.1, the data files were not compatible between this versions.

What is the solution? I'm just writing my own export/import tool to solve this. I hope to post a link for this tool soon.

Upvotes: 2

Tamil
Tamil

Reputation: 5358

You can use sstable2json and json2sstable cassandra tools

Check out Datastax documentation on the same and this too

Usage: sstable2json [-f outfile] <sstable> [-k key [-k key [...]]]
Usage: json2sstable -K keyspace -c column_family <json> <sstable>

You can always execute cassandra-cli commands in file

cassandra-cli -h HOST -p PORT -f fileName

You can load all your create statements in to a file and execute this command

To get cli scripts to create keyspaces and column families use following command in cassandra-cli interface

show schema

But incase you wanna create a cluster of two nodes. You don't need to do all the above. Just starting the other node with different token range and same cluster name would do. Cassandra internally will manage to stream the data and schema informations

Upvotes: 6

Related Questions