user3554618
user3554618

Reputation: 43

copy command row size limit in cassandra

Could anyone tell the maximum size(no. of rows or file size) of a csv file we can load efficiently in cassandra using copy command. Is there a limit for it? if so is it a good idea to breakdown the size files into multiple files and load or we have any better option to do it? Many thanks.

Upvotes: 1

Views: 2201

Answers (2)

phact
phact

Reputation: 7305

Check out cassandra-9303 and 9302

and check out brian's cassandra-loader

https://github.com/brianmhess/cassandra-loader

Upvotes: 1

dtrihinas
dtrihinas

Reputation: 466

I've run into this issue before... At least for me there was no clear statement in any datastax or apache documentation of the max size. Basically, it may just be limited to your pc/server/cluster resources (e.g. cpu and memory).

However, in an article by jgong found here it is stated that you can import up to 10MB. For me it was something around 8.5MB. In the docs for cassandra 1.2 here its stated that you can import a few million rows and that you should use the bulk-loader for more heavy stuff.

All in all, I do suggest importing via multiple csv files (just dont make them too small so your opening/closing files constantly) so that you can keep a handle on data being imported and finding errors easier. It can happen that waiting for an hour for a file to load it fails and you start over whereas if you have multiple files you dont need to start over on the ones that already have been successfully imported. Not to mention key duplicate errors.

Upvotes: 1

Related Questions