Reputation: 1261
I created a Cassandra database in DataStax Astra. I'm able to connect to it in Python (using cassandra-driver
module, and the secure_connect_bundle
). I wrote a few api in my Python application to query the database.
I read that I can upload csv to it using dsbulk
. I am able to run the following command in Terminal and it works.
dsbulk load -url data.csv -k foo_keyspace -t foo_table \
-b "secure-connect-afterpay.zip" -u username -p password -header true
Then I try to run this same line in Python using subprocess
:
ret = subprocess.run(
['dsbulk', 'load', '-url', 'data.csv', '-k', 'foo_keyspace', '-t', 'foo_table',
'-b', 'secure-connect-afterpay.zip', '-u', 'username', '-p', 'password',
'-header', 'true'],
capture_output=True
)
But I got FileNotFoundError: [Errno 2] No such file or directory: 'dsbulk': 'dsbulk'
. Why is dsbulk
not recognized if I run it from Python?
A related question, it's probably not best practice to rely on subprocess
. Are there better ways to upload batch data to Cassandra?
Upvotes: 2
Views: 868
Reputation: 7365
I think it has to do with the way path is handled by subprocess. Try specifying the command as an absolute path, or relative like "./dsbulk" or "bin/dsbulk".
Alternatively, if you add the bin directory from the DS Bulk package to your PATH environment variable, it will work as you have it.
Upvotes: 4