F.S.
F.S.

Reputation: 1261

Use dsbulk load in python

I created a Cassandra database in DataStax Astra. I'm able to connect to it in Python (using cassandra-driver module, and the secure_connect_bundle). I wrote a few api in my Python application to query the database.

I read that I can upload csv to it using dsbulk. I am able to run the following command in Terminal and it works.

dsbulk load -url data.csv -k foo_keyspace -t foo_table \
-b "secure-connect-afterpay.zip" -u username -p password -header true

Then I try to run this same line in Python using subprocess:

ret = subprocess.run(
    ['dsbulk', 'load', '-url', 'data.csv', '-k', 'foo_keyspace', '-t', 'foo_table', 
     '-b', 'secure-connect-afterpay.zip', '-u', 'username', '-p', 'password', 
     '-header', 'true'],
    capture_output=True
)

But I got FileNotFoundError: [Errno 2] No such file or directory: 'dsbulk': 'dsbulk'. Why is dsbulk not recognized if I run it from Python?


A related question, it's probably not best practice to rely on subprocess. Are there better ways to upload batch data to Cassandra?

Upvotes: 2

Views: 868

Answers (1)

Adam Holmberg
Adam Holmberg

Reputation: 7365

I think it has to do with the way path is handled by subprocess. Try specifying the command as an absolute path, or relative like "./dsbulk" or "bin/dsbulk".

Alternatively, if you add the bin directory from the DS Bulk package to your PATH environment variable, it will work as you have it.

Upvotes: 4

Related Questions