Sarah K
Sarah K

Reputation: 73

generate UUID automatically for all records in Cassandra for an existing dataset

I have an existing dataset with around 700000 records in a CSV format. I have imported that data file into apache Cassandra table. The problem is

primary key. How can I automatically generate (upsert) uuid into my primary key column for all of my records? I am using Cassandra 3.10.

Upvotes: 1

Views: 1647

Answers (1)

Luke Tillman
Luke Tillman

Reputation: 1385

Unfortunately, if you're using the COPY command you don't really have any options for generating UUIDs on the fly for your rows. I think you really have two options, both of which involve doing things programmatically to one extent or another:

  1. Do some pre-processing on your CSV file to generate and add a UUID to each row, writing out a new file with that additional field and UUID value for each row. It should be pretty straightforward to process the file, line by line, and generate those values using a small Python script or something similar. Then you can use the COPY command like before to import the data into Cassandra.
  2. Since you're already going to be writing some code, skip using the COPY command altogether and just write the code in Python (or Java or your language of choice) to read the file, parse each CSV line into values, generate a UUID for that row, and then INSERT the data into Cassandra using the appropriate driver for the programming language you're using.

If you decide to go with option 2, you'll find a list of the DataStax drivers for Cassandra towards the bottom of this page, along with documentation for how to use them. Hope that helps!

Upvotes: 2

Related Questions