user2904488
user2904488

Reputation: 11

Cassandra copy command inserting only partial data

I am new to cassandra db and i am trying to import data from a csv file into cassandra, i executed the following command, to import the table, first i created the table using

create table cdma_mkt_bte (date_value timestamp primary key, region varchar, vendor varchar);

and then copied using

copy cdma_mkt_bte (date_value, region, vendor) from '/usr/share/dse/bin/cdma_mkt_bte' with HEADER = TRUE;

The problem is the table in the csv file has about 43,000 rows while only 211 rows are getting imported into cassandra, i looked at the 211 and 212th rows to see if there is strange going on, it seems to be ok. Can you please help me? and what are the other options to import a csv into the cassandra database.

Thank you! Would really appreciate the help!

Upvotes: 1

Views: 1752

Answers (2)

catpaws
catpaws

Reputation: 2283

The options you can use for the COPY command are described in this doc:

http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/copy_r.html?scroll=reference_ds_mh1_1hs_xj__description_unique_3

Continue looking for a problem in the CSV file. Check for a hidden character at the end of a line. I think I remember a trailing blank space causing a problem. The problem might not have been located at exactly the location reported by the COPY command. I opened the CSV in Excel and that revealed the problem.

Upvotes: 0

ashic
ashic

Reputation: 6495

Your primary key seems to be date_value. All inserts and updates in cassandra are essentially upserts on a primary key. If two records have the same primary key, the second will overwrite the first. If the way to uniquely identify a record is date_value + region + vendor, then your schema should like:

create table cdma_mkt_bte (date_value timestamp, region varchar, vendor varchar, 
primary key (date_value, region, vendor));

Is this possibly the reason you're not getting the expected number of records?

Upvotes: 1

Related Questions