kyle regino
kyle regino

Reputation: 35

Import csv file in cassandra using python script

I want to import a csv file in cassandra using python script. I already know how to import csv in cassandra using the COPY command in the cqlsh.

My csv file with dtypes: timestamp, timestamp, decimal, decimal, decimal, decimal, decimal, decimal

I already have a code, and I am only lacking on how I would import the csv file inside cassandra

KEYSPACE = "test_cassandra"

cluster = Cluster(['127.0.0.1'])
session = cluster.connect()

log.info("creating keyspace . . .")
session.execute("""
    CREATE KEYSPACE IF NOT EXISTS %s 
    WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} 
    """ % KEYSPACE)

log.info("setting keyspace . . .")
session.set_keyspace(KEYSPACE)

log.info("creating table . . .")
session.execute("""
    CREATE TABLE IF NOT EXISTS test_NYC_taxi (
        pickup timestamp, 
        dropoff timestamp, 
        distance decimal, 
        fare decimal, 
        p_long decimal, 
        p_lat decimal, 
        d_long decimal, 
        d_lat decimal, 
        PRIMARY KEY(pickup, dropoff, distance));
    """)



prepared = session.prepare("""
        INSERT INTO test_nyc_taxi (pickup, dropoff, distance, fare, p_long, p_lat, d_long, d_lat)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?)
        """)


Any help would be very much appreciated. :)

Upvotes: 1

Views: 3911

Answers (2)

Aaron
Aaron

Reputation: 57748

You're on the right path. The only things you need to add, is a file reader (I saved your data as a .csv file and got it to work), a for-loop to iterate through it, and then an import of the datetime package to convert your dates for the prepared statement.

This worked for me:

with open("/home/aaron/Documents/stackoverflow/test_NYC_taxi.csv", "r") as fares:
    for fare in fares:
        columns=fare.split(",")
        pickup=datetime.datetime.strptime(columns[0],"%Y-%m-%d").date()
        dropoff=datetime.datetime.strptime(columns[1],"%Y-%m-%d").date()
        distance=columns[2]
        fare=columns[3]
        p_long=columns[4]
        p_lat=columns[5]
        d_long=columns[6]
        d_lat=columns[7]

        session.execute(prepared, [pickup,dropoff,distance,fare,p_long,p_lat,d_long,d_lat])

#closing the file
fares.close()

#closing Cassandra connection
session.shutdown()

Upvotes: 1

charybr
charybr

Reputation: 1948

Open the csv file reader in python code. Iterate over the lines and call the insert statement. Please check below link.

https://stackoverflow.com/a/19828117/1358551

Upvotes: 0

Related Questions