genfromtxt different datatypes

Question

I am trying to import data from a text file with varying number of columns. I know that the first column will always be an int and subsequent cols will be floats in all files. How can I specify this explicitly using dtypes?

dtypes=[int,float,float,float...] #this number will change depending on the number of columns in the file

data=np.genfromtxt(file,dtype=dtypes,delimiter='	',skip_header=11) #read in 
the data

Thanks

Thomas K&#252;hn · Accepted Answer

You could first read everything as floats and convert the array into a structured array after you know how many columns you have:

##input.txt:
##    1 1.4 5e23
##    2 2.3 5e-12
##    3 5.7 -1.3e-2

import numpy as np

data = np.genfromtxt('input.txt')
print(data)
print('-'*50)

colw = data.shape[1]

dtypes = [('col0', int)]+[('col{}'.format(i+1),float) for i in range(colw-1)]
print(dtypes)
print('-'*50)

converted_data = np.array([tuple(r) for r in data], dtype = dtypes)

print(converted_data)

This gives the following output:

[[  1.00000000e+00   1.40000000e+00   5.00000000e+23]
 [  2.00000000e+00   2.30000000e+00   5.00000000e-12]
 [  3.00000000e+00   5.70000000e+00  -1.30000000e-02]]
--------------------------------------------------
[('col0', ), ('col1', ), ('col2', )]
--------------------------------------------------
[(1,  1.4,   5.00000000e+23) (2,  2.3,   5.00000000e-12)
 (3,  5.7,  -1.30000000e-02)]

Tested on Python 3.5

genfromtxt different datatypes

Answers (1)

Related Questions