Kevin
Kevin

Reputation: 3730

What is a good Bulk data loading tool for Cassandra

I'm looking for a tool to load CSV into Cassandra. I was hoping to use RazorSQL for this but I've been told that it will be several months out.

What is a good tool?

Thanks

Upvotes: 3

Views: 3253

Answers (3)

ezrarze
ezrarze

Reputation: 41

Upvotes: 0

samarth
samarth

Reputation: 4024

1) If you have all the data to be loaded in place you can try sstableloader(only for cassandra 0.8.x onwards) utility to bulk load the data.For more details see:cassandra bulk loader

2) Cassandra has introduced BulkOutputFormat bulk loading data into cassandra with hadoop job in latest version that is cassandra-1.1.x onwards. For more details see:Bulkloading to Cassandra with Hadoop

Upvotes: 2

DNA
DNA

Reputation: 42617

I'm dubious that tool support would help a great deal with this, since a Cassandra schema needs to reflect the queries that you want to run, rather than just being a generic model of your domain.

The built-in bulk loading mechanism for cassandra is via BinaryMemtables: http://wiki.apache.org/cassandra/BinaryMemtable

However, whether you use this or the more usual Thrift interface, you still probably need to manually design a mapping from your CSV into Cassandra ColumnFamilies, taking into account the queries you need to run. A generic mapping from CSV-> Cassandra may not be appropriate since secondary indexes and denormalisation are commonly needed.

Upvotes: 1

Related Questions