Use csv module with numpy array

Question

How can I use the csv module reader to store a parsed row in a numpy array? I want to use the csv module because it supports a quotechar and my data has many embedded commas. I have a very wide file of heterogeneous data. I have stored the column names and numpy data types in a list of tuples.

I would like to use the csv reader to read each row of a file into a list of string data, and then load that list of strings into a numpy array coercing the values based on the data types. Is this even possible? I have found a couple mentions of people using the csv module and numpy/scipy together, but I have yet to see an actual implementation.

This is what I have so far:

Here is a sample of my dtypes array:

In [0]: np_dtypes[20:30]
Out[0]:
[('out_sec_range', dtype('S16')),
 ('out_p_city_name', dtype('S16')),
 ('out_st', dtype('S16')),
 ('out_z5', dtype('S16')),
 ('out_zip4', dtype('S16')),
 ('out_lat', dtype('S16')),
 ('out_long', dtype('S16')),
 ('out_county', dtype('S16')),
 ('out_geo_blk', dtype('S16')),
 ('out_addr_type', dtype('S16'))]

And this is the function I'm working on to import the data:

def import_csv(f, dtypes):
     with open(f, 'r') as csvfile:
          reader = csv.reader(csvfile, delimiter=',', quotechar='"')
          next(reader, None)
          for row in reader:
               # this fails
               data = np.array(row, dtype=dtypes)
               print data

My main goal is to be able to import a csv file with embedded commas into a numpy data structure.

Use csv module with numpy array

Answers (1)

Related Questions