Use dsbulk load in python

Question

I created a Cassandra database in DataStax Astra. I'm able to connect to it in Python (using cassandra-driver module, and the secure_connect_bundle). I wrote a few api in my Python application to query the database.

I read that I can upload csv to it using dsbulk. I am able to run the following command in Terminal and it works.

dsbulk load -url data.csv -k foo_keyspace -t foo_table \
-b "secure-connect-afterpay.zip" -u username -p password -header true

Then I try to run this same line in Python using subprocess:

ret = subprocess.run(
    ['dsbulk', 'load', '-url', 'data.csv', '-k', 'foo_keyspace', '-t', 'foo_table', 
     '-b', 'secure-connect-afterpay.zip', '-u', 'username', '-p', 'password', 
     '-header', 'true'],
    capture_output=True
)

But I got FileNotFoundError: [Errno 2] No such file or directory: 'dsbulk': 'dsbulk'. Why is dsbulk not recognized if I run it from Python?

A related question, it's probably not best practice to rely on subprocess. Are there better ways to upload batch data to Cassandra?

Use dsbulk load in python

Answers (1)

Related Questions