Reputation: 720
I am trying to learn Python and Numpy, so please bear with me. I am using numpy.genfromtxt to import a CSV file into a matrix. The CSV looks as follows:
Time(min),Nm,Speed,Power,Distance,Rpm,Bpm,interval,Altitude,Rate,Incline,Temp,PowerBalance,LeftTorqueEffectiveness,RightTorqueEffectiveness,getLeftPedalSmoothness,getRightPedalSmoothness,getCombinedPedalSmoothness,THb,SmO2,km
0.016666668,,4.3555064,0,0.002,0,118,1,684.3,0.0,0.0,14.71,50,-1.0,-1.0,-1.0,-1.0,-1.0,311.72,311.72
0.033333335,,4.3555064,20,0.002,0,119,1,684.3,0.0,0.0,14.71,50,-1.0,-1.0,-1.0,-1.0,-1.0,311.72,311.72
0.05,,4.444291,13,0.004,0,119,1,684.3,0.0,0.0,14.71,50,-1.0,-1.0,-1.0,-1.0,-1.0,311.72,311.72
Now I run:
matrixCsv = np.genfromtxt(open(csvFile, "rb"), delimiter=',', \
missing_values=0,skip_header=1,dtype=float,\
usecols=(0,2,3,4,5,6,7,8,9,10,11,17),names=True)
and I get:
[ (0.033333335, 4.3555064, 20.0, 0.002, 0.0, 119.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0)
(0.05, 4.444291, 13.0, 0.004, 0.0, 119.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0)
(0.06666667, 4.4781966, 16.0, 0.006, 0.0, 120.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0)
...,
which to me looks like tuples encapsulated into an array. But why tuples? I understand that numpy arrays/matrices need to be homogeneous, and that numpy makes a tuple out of inhomogeneous data. But why is my data inhomogeneous? I do not understand...
Upvotes: 4
Views: 3537
Reputation: 54330
You get confused about how to use skip_header
and names
. The right way to read the data, and use the first row as variable names is:
In [185]:
np.genfromtxt('temp.csv', delimiter=',', \
missing_values=0,skip_header=0,dtype=float,\
usecols=(0,2,3,4,5,6,7,8,9,10,11,17),names=True)
Out[185]:
array([ (0.016666668, 4.3555064, 0.0, 0.002, 0.0, 118.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0),
(0.033333335, 4.3555064, 20.0, 0.002, 0.0, 119.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0),
(0.05, 4.444291, 13.0, 0.004, 0.0, 119.0, 1.0, 684.3, 0.0, 0.0, 14.71, -1.0)],
dtype=[('Timemin', '<f8'), ('Speed', '<f8'), ('Power', '<f8'), ('Distance', '<f8'), ('Rpm', '<f8'), ('Bpm', '<f8'), ('interval', '<f8'), ('Altitude', '<f8'), ('Rate', '<f8'), ('Incline', '<f8'), ('Temp', '<f8'), ('getCombinedPedalSmoothness', '<f8')])
It is not a array of tuple
, but a structured array
. skip_header=1
will result using the first row of data as names, which is probably not what you want (see how you are missing the first line of data?).
You can also get rid of the names and read the data into a ordinary numpy
array
.
In [186]:
np.genfromtxt('temp.csv', delimiter=',', \
missing_values=0,skip_header=1,dtype=float,\
usecols=(0,2,3,4,5,6,7,8,9,10,11,17))
Out[186]:
array([[ 1.66666680e-02, 4.35550640e+00, 0.00000000e+00,
2.00000000e-03, 0.00000000e+00, 1.18000000e+02,
1.00000000e+00, 6.84300000e+02, 0.00000000e+00,
0.00000000e+00, 1.47100000e+01, -1.00000000e+00],
[ 3.33333350e-02, 4.35550640e+00, 2.00000000e+01,
2.00000000e-03, 0.00000000e+00, 1.19000000e+02,
1.00000000e+00, 6.84300000e+02, 0.00000000e+00,
0.00000000e+00, 1.47100000e+01, -1.00000000e+00],
[ 5.00000000e-02, 4.44429100e+00, 1.30000000e+01,
4.00000000e-03, 0.00000000e+00, 1.19000000e+02,
1.00000000e+00, 6.84300000e+02, 0.00000000e+00,
0.00000000e+00, 1.47100000e+01, -1.00000000e+00]])
Upvotes: 3