Numpy : read data from CSV having numerals as string

Question

I'm reading a .csv file in python using command as:

data = np.genfromtxt('home_data.csv', dtype=float, delimiter=',', names=True)

this csv has one column with zipcode which are numerals but in string format, for eg "85281". This column has values as nan:

data['zipcode']
Output : array([ nan,  nan,  nan, ...,  nan,  nan,  nan])

How can I convert these values in string to integers so as to get an array of values and not of 'nan's.

B. M. · Accepted Answer

you must help genfromtxt a little :

 data = np.genfromtxt('home_data.csv',
 dtype=[int,float],delimiter=',',names=True,
 converters={0: lambda b:(b.decode().strip('"'))})

each field is collected as bytes. float(b'1 ') return 1.0 , but float(b'"8210"') give an error. the converters option allow to define for each field (here field 0) a function to do the proper conversion, here converting in string(decode) and removing (strip) the trailing ".

If home_data.csv is :

zipcode,val
"8210",1
"8320",2
"14",3

you will obtain :

data -> array([(8210, 1.0), (8320, 2.0), (14, 3.0)], dtype=[('zipcode', ' array([8210, 8320,   14])

Numpy : read data from CSV having numerals as string

Answers (2)

Related Questions