nanachan
nanachan

Reputation: 1121

After converting str to float, numpy array returns strings

I have an array of data with multiple rows, like so:

['20.57', '17.77', '132.9', ..., '0.07017', '0.1812', '0.05667']

and I need to convert it to floats. The first row is feature names.

When I try to do the following:

for i in features[1,:]:
    i = i.astype(np.float)

and print each i, it prints floats:

20.57
17.77
132.9
and so on

However, when i print "features", I get:

['20.57', '17.77', '132.9', ..., '0.07017', '0.1812', '0.05667']

What am I doing wrong and how to fix this?

Upvotes: 0

Views: 872

Answers (5)

hpaulj
hpaulj

Reputation: 231385

Make a simpler array from the list:

In [26]: features = ['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.667']
In [27]: features
Out[27]: ['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.667']
In [28]: features = np.array(features)
In [29]: features
Out[29]: 
array(['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.667'], 
      dtype='<U7')

Note that this is an array of strings

I can use astype to make a NEW array of floats:

In [30]: features.astype(float)
Out[30]: 
array([  2.05700000e+01,   1.77700000e+01,   1.32900000e+02,
         7.01700000e-02,   1.81200000e-01,   6.67000000e-01])

but that does not change the original features array. It is still strings.

In [31]: features
Out[31]: 
array(['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.667'], 
      dtype='<U7')

I'd have to reassign the features variable to get a new float array

In [32]: features = features.astype(float)
In [33]: features
Out[33]: 
array([  2.05700000e+01,   1.77700000e+01,   1.32900000e+02,
         7.01700000e-02,   1.81200000e-01,   6.67000000e-01])

I could have gone directly from the list of strings to an array of floats with:

In [34]: features = ['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.667']
In [35]: features = np.array(features,float)
In [36]: features
Out[36]: 
array([  2.05700000e+01,   1.77700000e+01,   1.32900000e+02,
         7.01700000e-02,   1.81200000e-01,   6.67000000e-01])

But if there are any strings in the list that can't be converted to a float I'll either get an error or a string array.

Also I can't make the change in-place or piecemeal

In [40]: features[1] = float(features[1])
In [41]: features
Out[41]: 
array(['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.667'], 
      dtype='<U7')

The features array is fixed as U7; I can't change it to float; I can only make a new array with values derived from the original.

Upvotes: 0

nanachan
nanachan

Reputation: 1121

It worked when I did the following:

floatfeatures = features[1:]
floatfeatures = np.array(floatfeatures, dtype=float)

I'm not sure if this is the most gracious way to do it, but it worked.

Upvotes: 0

Daniel F
Daniel F

Reputation: 14399

Unless the dype of your array is object (don't do this), or you have a structured array, you can't have multiple dtypes. So if you ave one string in your array, numpy will cast them all to strings.

Best bet is to split the array into two parts.

fNames=features[0,:]
features=features[1,:].astype(float)

If you have lots of columns with different types, you probably want to cast it into a structured array

Upvotes: 1

CodeCupboard
CodeCupboard

Reputation: 1585

You could do this by creating a new list

features = ['20.57', '17.77', '132.9', '0.07017', '0.1812', '0.05667']
featuresFloat = []
for i in features:
    featuresFloat.append(float(i))
print featuresFloat

This may not be best solution for large datasets though it does give readable code

Upvotes: 1

B. M.
B. M.

Reputation: 18628

Just do

features=features.astype(float)

When you do i=i.astype(float), you do not affect the array. And remember it's often a bad idea to loop over array : use array methods instead.

Upvotes: 0

Related Questions