djpiky
djpiky

Reputation: 45

parsing huge csv file into mysql [python]

I have some problems with parsing huge csv file into mysql databse.

Csv file looks like this:

ref1  data1  data2  data3...
ref1  data4  data5  data6...
ref2  data1  data2  data3 data4 data5..
ref2  data12 data13 data14
ref2  data21 data22...
.
.
.

Csv file has about 1 milion lines or about 7MB in zip file or about 150MB unzip.

My job is to parse the data from csv into mysql, but only the data/lines when references matches. Another problem is, that from multiple lines in csv i must parse it in only one line in mysql for one reference.

I tryed to do this with csv.reader and for loops on each references, but is ultra slow.

with con:
cur.execute("SELECT ref FROM users")
user=cur.fetchall()
for i in range(len(user)):
    with open('hugecsv.csv', mode='rb') as f:
        reader = csv.reader(f, delimiter=';')                               
        for row in reader:
            if(str(user[i][0])==row[0]):
                writer.writerow(row)

So i have all references which i would like to parse, in my list user. Which is the fastes way to parse?

Please help!

Upvotes: 1

Views: 1949

Answers (3)

bruno desthuilliers
bruno desthuilliers

Reputation: 77912

The first obvious bottleneck is that you are reopening and scanning the whole CSV file for each user in your database. Doing a single pass on the csv would be faster :

# faster lookup on users
cur.execute ("select ref from users")
users = set(row[0] for row in cur.fetchall())

with open("your/file.CSV") as f:
    r = reader(f)
    for row in r:
        if row[0] in users:
            do_something_with(row)

Upvotes: 2

Paul Draper
Paul Draper

Reputation: 83215

You haven't included all your logic. If you just want to import everything into a single table,

cur.execute("LOAD DATA INFILE 'path_to_file.csv' INTO TABLE my_table;")

MySQL does it directly. You can't get any faster than that.

Documentation

Upvotes: 0

BaBL86
BaBL86

Reputation: 2622

Use:

LOAD DATA INFILE 'EF_PerechenSkollekciyami.csv' TO `TABLE_NAME` FIELDS TERMINATED BY ';'

This is an internal query command in mysql.

I don't recommend you to use tabs to separate columns, and recommend you to change this by sed to ; or something another character. But you can try with tabs too.

Upvotes: 1

Related Questions