Reputation: 40648
How can I use the csv module reader to store a parsed row in a numpy array? I want to use the csv module because it supports a quotechar and my data has many embedded commas. I have a very wide file of heterogeneous data. I have stored the column names and numpy data types in a list of tuples.
I would like to use the csv reader to read each row of a file into a list of string data, and then load that list of strings into a numpy array coercing the values based on the data types. Is this even possible? I have found a couple mentions of people using the csv module and numpy/scipy together, but I have yet to see an actual implementation.
This is what I have so far:
Here is a sample of my dtypes array:
In [0]: np_dtypes[20:30]
Out[0]:
[('out_sec_range', dtype('S16')),
('out_p_city_name', dtype('S16')),
('out_st', dtype('S16')),
('out_z5', dtype('S16')),
('out_zip4', dtype('S16')),
('out_lat', dtype('S16')),
('out_long', dtype('S16')),
('out_county', dtype('S16')),
('out_geo_blk', dtype('S16')),
('out_addr_type', dtype('S16'))]
And this is the function I'm working on to import the data:
def import_csv(f, dtypes):
with open(f, 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
next(reader, None)
for row in reader:
# this fails
data = np.array(row, dtype=dtypes)
print data
My main goal is to be able to import a csv file with embedded commas into a numpy data structure.
Upvotes: 1
Views: 261
Reputation: 58985
You can perhaps use np.genfromtxt()
together with a function that will treat each line of it:
def myfunc(line):
return line.replace('"', '') # removing the quotes
a = np.genfromtxt((myfunc(line) for line in open(fname)), dtype=None)
Note: you can probably use your dtype
instead of None
, but the latter usually works properly if your first row contains the column names.
Upvotes: 0