Populating a dictionary from a csv file with extremely large field sizes

Question

I've received an error trying to import a .csv file from the csv module when my field size exceeded 131,072. The csv module exports files with fields exceeding 131,072. It's my value for the dictionary with the massive size. My keys are small. Do I need a different file format to store dictionaries with huge values?

I use csv throughout my program, using it consistently is convenient. If multiple data types is unavoidable, what is a good alternative? I'd like to store values which could be thousands-millions of characters in length.

Here's the error message

dictionary = e.csv_import(filename)
File "D:\Matt\Documents\Projects\Python\Project 17\e.py", line 8, in csv_import
for key, value in csv.reader(open(filename)):
_csv.Error: field larger than field limit (131072)

Here's my code

def csv_import(filename):
    dictionary = {}
    for key, value in csv.reader(open(filename)):
        dictionary[key] = value
    return dictionary

def csv_export(dictionary, filename): 
    csv_file = csv.writer(open(filename, "w"))
    for key, value in dictionary.items():
        csv_file.writerow([key, value])

mhawke · Accepted Answer

You can adjust the maximum field size via:

>>> import csv
>>> csv.field_size_limit()
131072
>>> old_size = csv.field_size_limit(1024*1024)
>>> csv.field_size_limit()
1048576

For alternatives see below.

You want a persistent dictionary so you could use the shelve module.

import shelve

# open shelf and write a large value
shelf = shelve.open(filename)
shelf['a'] = 'b' * 200000
shelf.close()

# read it back in
shelf = shelve.open(filename)

>>> print len(shelf['a'])
200000

Under the hood it's using pickle so there are compatibility issues if you wanted to use the shelf file outside of Python. But if compatibility is required, you could use JSON to serialise your dictionary - I assume that the dictionary's values are strings.

import json

def dict_import(filename):
    with open(filename) as f:
        return json.load(f)

def dict_export(dictionary, filename): 
    with open(filename, "w") as f:
        json.dump(dictionary, f)

Populating a dictionary from a csv file with extremely large field sizes

Answers (2)

Related Questions