Tania
Tania

Reputation: 1925

Python CSV parsing fills up memory

I have a CSV file which has over a million rows and I am trying to parse this file and insert the rows into the DB.

    with open(file, "rb") as csvfile:

        re = csv.DictReader(csvfile)
        for row in re:
        //insert row['column_name'] into DB

For csv files below 2 MB this works well but anything more than that ends up eating my memory. It is probably because i store the Dictreader's contents in a list called "re" and it is not able to loop over such a huge list. I definitely need to access the csv file with its column names which is why I chose dictreader since it easily provides column level access to my csv files. Can anyone tell me why this is happening and how can this be avoided?

Upvotes: 0

Views: 1084

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 148880

The DictReader does not load the whole file in memory but read it by chunks as explained in this answer suggested by DhruvPathak.

But depending on your database engine, the actual write on disk may only happen at commit. That means that the database (and not the csv reader) keeps all data in memory and at end exhausts it.

So you should try to commit every n records, with n typically between 10 an 1000 depending on the size of you lines and the available memory.

Upvotes: 4

Julien Spronck
Julien Spronck

Reputation: 15423

If you don't need the entire columns at once, you can simply read the file line by line like you would with a text file and parse each row. The exact parsing depends on your data format but you could do something like:

delimiter = ','
with open(filename, 'r') as fil:
    headers = fil.next()
    headers = headers.strip().split(delimiter)
    dic_headers = {hdr: headers.index(hdr) for hdr in headers}
    for line in fil:
        row = line.strip().split(delimiter)
        ## do something with row[dic_headers['column_name']]

This is a very simple example but it can be more elaborate. For example, this does not work if your data contains ,.

Upvotes: 1

Related Questions