Reputation: 2216
I have to read a large CSV file almost of 100K rows in the file, also it will be very easier to process that file if I can read each file row in a dictionary format.
After little research I found python's built-in function csv.DictReader from the csv module.
But in the documentation it is not clear mentioned whether it stores whole file in memory or not.
But it has mentioned that:
The fieldnames parameter is a sequence whose elements are associated with the fields of the input data in order.
But I'm not sure whether sequence is stored in memory or not.
So the question is, does it store whole file in the memory?
If so, is there any other option to read single row as a generaror expression from the file and read get row as dict .
Here is my code:
def file_to_dictionary(self, file_path):
"""Read CSV rows as a dictionary """
file_data_obj ={}
try:
self.log("Reading file: [{}]".format(file_path))
if os.path.exists(file_path):
file_data_obj = csv.DictReader(open(file_path, 'rU'))
else:
self.log("File does not exist: {}".format(file_path))
except Exception as e:
self.log("Failed to read file.", e, True)
return file_data_obj
Upvotes: 3
Views: 3565
Reputation: 1005
As far as im aware the DictReader object you create, in your case file_data_obj
, is a generator type object.
Generator objects are not stored in memory but can only be iterated over once!
To print the fieldnames of your data as a list you can simply use: print file_data_obj.fieldnames
Secondly, in my experience I find it much easier to use a list of dictionaries when reading data from csv files, where each dictionary represents a row in your file. Consider the following:
def csv_to_dict_list(path):
csv_in = open(path, 'rb')
reader = csv.DictReader(csv_in, restkey=None, restval=None, dialect='excel')
fields = reader.fieldnames
list_out = [row for row in reader]
return list_out, fields
Using the function above (or something similar), you can acheive your goal with a couple of lines. Eg:
data, data_fields = csv_to_dict_list(path)
print data_fields (prints fieldnames)
print data[0] (prints first row of data from file)
Hope this helps! Luke
Upvotes: 5