RJP
RJP

Reputation: 405

Importing JSON dataset into Cassandra

I have a large dataset consisting of about 80,000 records. I want to import this into Cassandra. I only see documentation for CSV format. Is this possible for JSON?

Upvotes: 2

Views: 7609

Answers (3)

Alex Ott
Alex Ott

Reputation: 87119

In 2020, you can use DataStax Bulk Loader utility (DSBulk) for loading & unloading of Cassandra/DSE data in CSV and JSON formats. It's very flexible, and allows to load only part of data, flexibly map JSON fields into table fields, etc. It supports Cassandra 2.1+, and very fast.

In simplest case, data loading command would look as following:

dsbulk load -k keyspace -t table -c json -url your_file.json

DataStax blog has a series of articles about DSBulk: 1, 2, 3, 4, 5, 6

Upvotes: 4

Artem Trunov
Artem Trunov

Reputation: 1415

See dsbulk solution as the ultimate one, however you may consider this trick that converts json-formatted messages (one per line) to csv on the fly (no separate conversion necessary) and loads into Cassandra using cqlsh, i.e. :

cat file.json | jq -r '[.uid,.gender,.age] | @csv' | cqlsh -e 'COPY labdata.clients(uid,gender,age) from STDIN;'

Explanations:

This requires a jq utility, installed e.g. for ubuntu as apt install jq.

Here I have a file with the following messages:

{"uid": "d50192e5-c44e-4ae8-ae7a-7cfe67c8b777", "gender": "F", "age": 19}
{"uid": "d502331d-621e-4721-ada2-5d30b2c3801f", "gender": "M", "age": 32}

This is how I convert it to csv on the fly:

cat file | jq -r '[.uid,.gender,.age] | @csv'

where -r will remove some extra \", but you still end up with quoted strings:

"d50192e5-c44e-4ae8-ae7a-7cfe67c8b777","F",19
"d502331d-621e-4721-ada2-5d30b2c3801f","M",32

Now, if you create a table clients in keyspace labdata for this data using cqlsh:

CREATE TABLE clients ( uid ascii PRIMARY KEY, gender ascii, age int);

then you should be able to run the COPY ... FROM STDIN command above

Upvotes: 0

root
root

Reputation: 3937

To insert JSON data, add JSON to the INSERT command. refer to this link for details https://docs.datastax.com/en/cql/3.3/cql/cql_using/useInsertJSON.html

Upvotes: 0

Related Questions