Reputation: 405
I have a large dataset consisting of about 80,000 records. I want to import this into Cassandra. I only see documentation for CSV format. Is this possible for JSON?
Upvotes: 2
Views: 7609
Reputation: 87119
In 2020, you can use DataStax Bulk Loader utility (DSBulk) for loading & unloading of Cassandra/DSE data in CSV and JSON formats. It's very flexible, and allows to load only part of data, flexibly map JSON fields into table fields, etc. It supports Cassandra 2.1+, and very fast.
In simplest case, data loading command would look as following:
dsbulk load -k keyspace -t table -c json -url your_file.json
DataStax blog has a series of articles about DSBulk: 1, 2, 3, 4, 5, 6
Upvotes: 4
Reputation: 1415
See dsbulk
solution as the ultimate one, however you may consider this trick that converts json-formatted messages (one per line) to csv on the fly (no separate conversion necessary) and loads into Cassandra using cqlsh
, i.e. :
cat file.json | jq -r '[.uid,.gender,.age] | @csv' | cqlsh -e 'COPY labdata.clients(uid,gender,age) from STDIN;'
Explanations:
This requires a jq
utility, installed e.g. for ubuntu as apt install jq
.
Here I have a file with the following messages:
{"uid": "d50192e5-c44e-4ae8-ae7a-7cfe67c8b777", "gender": "F", "age": 19}
{"uid": "d502331d-621e-4721-ada2-5d30b2c3801f", "gender": "M", "age": 32}
This is how I convert it to csv on the fly:
cat file | jq -r '[.uid,.gender,.age] | @csv'
where -r will remove some extra \", but you still end up with quoted strings:
"d50192e5-c44e-4ae8-ae7a-7cfe67c8b777","F",19
"d502331d-621e-4721-ada2-5d30b2c3801f","M",32
Now, if you create a table clients
in keyspace labdata
for this data using cqlsh
:
CREATE TABLE clients ( uid ascii PRIMARY KEY, gender ascii, age int);
then you should be able to run the COPY ... FROM STDIN
command above
Upvotes: 0
Reputation: 3937
To insert JSON data, add JSON to the INSERT command. refer to this link for details https://docs.datastax.com/en/cql/3.3/cql/cql_using/useInsertJSON.html
Upvotes: 0