aag
aag

Reputation: 720

numpy.genfromtxt imports tuples instead of arrays

I am trying to learn Python and Numpy, so please bear with me. I am using numpy.genfromtxt to import a CSV file into a matrix. The CSV looks as follows:

Time(min),Nm,Speed,Power,Distance,Rpm,Bpm,interval,Altitude,Rate,Incline,Temp,PowerBalance,LeftTorqueEffectiveness,RightTorqueEffectiveness,getLeftPedalSmoothness,getRightPedalSmoothness,getCombinedPedalSmoothness,THb,SmO2,km
0.016666668,,4.3555064,0,0.002,0,118,1,684.3,0.0,0.0,14.71,50,-1.0,-1.0,-1.0,-1.0,-1.0,311.72,311.72
0.033333335,,4.3555064,20,0.002,0,119,1,684.3,0.0,0.0,14.71,50,-1.0,-1.0,-1.0,-1.0,-1.0,311.72,311.72
0.05,,4.444291,13,0.004,0,119,1,684.3,0.0,0.0,14.71,50,-1.0,-1.0,-1.0,-1.0,-1.0,311.72,311.72

Now I run:

matrixCsv = np.genfromtxt(open(csvFile, "rb"), delimiter=',', \
                          missing_values=0,skip_header=1,dtype=float,\
                          usecols=(0,2,3,4,5,6,7,8,9,10,11,17),names=True)

and I get:

[ (0.033333335, 4.3555064, 20.0, 0.002, 0.0, 119.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0)
(0.05, 4.444291, 13.0, 0.004, 0.0, 119.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0)
(0.06666667, 4.4781966, 16.0, 0.006, 0.0, 120.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0)
...,

which to me looks like tuples encapsulated into an array. But why tuples? I understand that numpy arrays/matrices need to be homogeneous, and that numpy makes a tuple out of inhomogeneous data. But why is my data inhomogeneous? I do not understand...

Upvotes: 4

Views: 3537

Answers (1)

CT Zhu
CT Zhu

Reputation: 54330

You get confused about how to use skip_header and names. The right way to read the data, and use the first row as variable names is:

In [185]:

np.genfromtxt('temp.csv', delimiter=',', \
                          missing_values=0,skip_header=0,dtype=float,\
                          usecols=(0,2,3,4,5,6,7,8,9,10,11,17),names=True)
Out[185]:
array([ (0.016666668, 4.3555064, 0.0, 0.002, 0.0, 118.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0),
       (0.033333335, 4.3555064, 20.0, 0.002, 0.0, 119.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0),
       (0.05, 4.444291, 13.0, 0.004, 0.0, 119.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0)], 
      dtype=[('Timemin', '<f8'), ('Speed', '<f8'), ('Power', '<f8'), ('Distance', '<f8'), ('Rpm', '<f8'), ('Bpm', '<f8'), ('interval', '<f8'), ('Altitude', '<f8'), ('Rate', '<f8'), ('Incline', '<f8'), ('Temp', '<f8'), ('getCombinedPedalSmoothness', '<f8')])

It is not a array of tuple, but a structured array. skip_header=1 will result using the first row of data as names, which is probably not what you want (see how you are missing the first line of data?).

You can also get rid of the names and read the data into a ordinary numpy array.

In [186]:

np.genfromtxt('temp.csv', delimiter=',', \
                          missing_values=0,skip_header=1,dtype=float,\
                          usecols=(0,2,3,4,5,6,7,8,9,10,11,17))
Out[186]:
array([[  1.66666680e-02,   4.35550640e+00,   0.00000000e+00,
          2.00000000e-03,   0.00000000e+00,   1.18000000e+02,
          1.00000000e+00,   6.84300000e+02,   0.00000000e+00,
          0.00000000e+00,   1.47100000e+01,  -1.00000000e+00],
       [  3.33333350e-02,   4.35550640e+00,   2.00000000e+01,
          2.00000000e-03,   0.00000000e+00,   1.19000000e+02,
          1.00000000e+00,   6.84300000e+02,   0.00000000e+00,
          0.00000000e+00,   1.47100000e+01,  -1.00000000e+00],
       [  5.00000000e-02,   4.44429100e+00,   1.30000000e+01,
          4.00000000e-03,   0.00000000e+00,   1.19000000e+02,
          1.00000000e+00,   6.84300000e+02,   0.00000000e+00,
          0.00000000e+00,   1.47100000e+01,  -1.00000000e+00]])

Upvotes: 3

Related Questions