Reputation: 45
I have some problems with parsing huge csv file into mysql databse.
Csv file looks like this:
ref1 data1 data2 data3...
ref1 data4 data5 data6...
ref2 data1 data2 data3 data4 data5..
ref2 data12 data13 data14
ref2 data21 data22...
.
.
.
Csv file has about 1 milion lines or about 7MB in zip file or about 150MB unzip.
My job is to parse the data from csv into mysql, but only the data/lines when references matches. Another problem is, that from multiple lines in csv i must parse it in only one line in mysql for one reference.
I tryed to do this with csv.reader and for loops on each references, but is ultra slow.
with con:
cur.execute("SELECT ref FROM users")
user=cur.fetchall()
for i in range(len(user)):
with open('hugecsv.csv', mode='rb') as f:
reader = csv.reader(f, delimiter=';')
for row in reader:
if(str(user[i][0])==row[0]):
writer.writerow(row)
So i have all references which i would like to parse, in my list user. Which is the fastes way to parse?
Please help!
Upvotes: 1
Views: 1949
Reputation: 77912
The first obvious bottleneck is that you are reopening and scanning the whole CSV file for each user in your database. Doing a single pass on the csv would be faster :
# faster lookup on users
cur.execute ("select ref from users")
users = set(row[0] for row in cur.fetchall())
with open("your/file.CSV") as f:
r = reader(f)
for row in r:
if row[0] in users:
do_something_with(row)
Upvotes: 2
Reputation: 83215
You haven't included all your logic. If you just want to import everything into a single table,
cur.execute("LOAD DATA INFILE 'path_to_file.csv' INTO TABLE my_table;")
MySQL does it directly. You can't get any faster than that.
Upvotes: 0
Reputation: 2622
Use:
LOAD DATA INFILE 'EF_PerechenSkollekciyami.csv' TO `TABLE_NAME` FIELDS TERMINATED BY ';'
This is an internal query command in mysql.
I don't recommend you to use tabs to separate columns, and recommend you to change this by sed to ; or something another character. But you can try with tabs too.
Upvotes: 1