Reputation: 1121
I have an array of data with multiple rows, like so:
['20.57', '17.77', '132.9', ..., '0.07017', '0.1812', '0.05667']
and I need to convert it to floats. The first row is feature names.
When I try to do the following:
for i in features[1,:]:
i = i.astype(np.float)
and print each i, it prints floats:
20.57
17.77
132.9
and so on
However, when i print "features", I get:
['20.57', '17.77', '132.9', ..., '0.07017', '0.1812', '0.05667']
What am I doing wrong and how to fix this?
Upvotes: 0
Views: 872
Reputation: 231385
Make a simpler array from the list:
In [26]: features = ['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.667']
In [27]: features
Out[27]: ['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.667']
In [28]: features = np.array(features)
In [29]: features
Out[29]:
array(['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.667'],
dtype='<U7')
Note that this is an array of strings
I can use astype
to make a NEW array of floats:
In [30]: features.astype(float)
Out[30]:
array([ 2.05700000e+01, 1.77700000e+01, 1.32900000e+02,
7.01700000e-02, 1.81200000e-01, 6.67000000e-01])
but that does not change the original features
array. It is still strings.
In [31]: features
Out[31]:
array(['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.667'],
dtype='<U7')
I'd have to reassign the features
variable to get a new float array
In [32]: features = features.astype(float)
In [33]: features
Out[33]:
array([ 2.05700000e+01, 1.77700000e+01, 1.32900000e+02,
7.01700000e-02, 1.81200000e-01, 6.67000000e-01])
I could have gone directly from the list of strings to an array of floats with:
In [34]: features = ['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.667']
In [35]: features = np.array(features,float)
In [36]: features
Out[36]:
array([ 2.05700000e+01, 1.77700000e+01, 1.32900000e+02,
7.01700000e-02, 1.81200000e-01, 6.67000000e-01])
But if there are any strings in the list that can't be converted to a float I'll either get an error or a string array.
Also I can't make the change in-place or piecemeal
In [40]: features[1] = float(features[1])
In [41]: features
Out[41]:
array(['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.667'],
dtype='<U7')
The features
array is fixed as U7
; I can't change it to float; I can only make a new array with values derived from the original.
Upvotes: 0
Reputation: 1121
It worked when I did the following:
floatfeatures = features[1:]
floatfeatures = np.array(floatfeatures, dtype=float)
I'm not sure if this is the most gracious way to do it, but it worked.
Upvotes: 0
Reputation: 14399
Unless the dype
of your array is object
(don't do this), or you have a structured array, you can't have multiple dtypes. So if you ave one string in your array, numpy will cast them all to strings.
Best bet is to split the array into two parts.
fNames=features[0,:]
features=features[1,:].astype(float)
If you have lots of columns with different types, you probably want to cast it into a structured array
Upvotes: 1
Reputation: 1585
You could do this by creating a new list
features = ['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.05667']
featuresFloat = []
for i in features:
featuresFloat.append(float(i))
print featuresFloat
This may not be best solution for large datasets though it does give readable code
Upvotes: 1
Reputation: 18628
Just do
features=features.astype(float)
When you do i=i.astype(float), you do not affect the array. And remember it's often a bad idea to loop over array : use array methods instead.
Upvotes: 0