12345678910111213
12345678910111213

Reputation: 363

Populating a dictionary from a csv file with extremely large field sizes

I've received an error trying to import a .csv file from the csv module when my field size exceeded 131,072. The csv module exports files with fields exceeding 131,072. It's my value for the dictionary with the massive size. My keys are small. Do I need a different file format to store dictionaries with huge values?

I use csv throughout my program, using it consistently is convenient. If multiple data types is unavoidable, what is a good alternative? I'd like to store values which could be thousands-millions of characters in length.

Here's the error message

dictionary = e.csv_import(filename)
File "D:\Matt\Documents\Projects\Python\Project 17\e.py", line 8, in csv_import
for key, value in csv.reader(open(filename)):
_csv.Error: field larger than field limit (131072)

Here's my code

def csv_import(filename):
    dictionary = {}
    for key, value in csv.reader(open(filename)):
        dictionary[key] = value
    return dictionary

def csv_export(dictionary, filename): 
    csv_file = csv.writer(open(filename, "w"))
    for key, value in dictionary.items():
        csv_file.writerow([key, value])

Upvotes: 0

Views: 880

Answers (2)

mhawke
mhawke

Reputation: 87054

You can adjust the maximum field size via:

>>> import csv
>>> csv.field_size_limit()
131072
>>> old_size = csv.field_size_limit(1024*1024)
>>> csv.field_size_limit()
1048576

For alternatives see below.

You want a persistent dictionary so you could use the shelve module.

import shelve

# open shelf and write a large value
shelf = shelve.open(filename)
shelf['a'] = 'b' * 200000
shelf.close()

# read it back in
shelf = shelve.open(filename)

>>> print len(shelf['a'])
200000

Under the hood it's using pickle so there are compatibility issues if you wanted to use the shelf file outside of Python. But if compatibility is required, you could use JSON to serialise your dictionary - I assume that the dictionary's values are strings.

import json

def dict_import(filename):
    with open(filename) as f:
        return json.load(f)

def dict_export(dictionary, filename): 
    with open(filename, "w") as f:
        json.dump(dictionary, f)

Upvotes: 1

Gerrat
Gerrat

Reputation: 29680

If you're looking for an alternative, you should probably just use pickle. It's much faster, and much easier than converting from and to a .csv file.

eg.

with open(filename) as f:
    dictionary = pickle.load(f)

and

with open(filename) as f:
    pickle.dump(dictionary, f)

One downside is that it's not easily read by other languages (if that's a consideration)

Upvotes: 2

Related Questions