Mornor
Mornor

Reputation: 3801

Could not convert string to float using dtype in nparray

I have a raw np.array() on which I would explicitly indicate the type of the column.

The data array has the following form:

data = [
    [name1, float1, float2, float3]
    # ... 
    [nameX, floatX, floatX, floatX]
]

Now, to explicitly specify the column type, I do the following:

data = np.array(data, dtype=[('name', str), ('amount0', float), ('amount1', float), ('amount2', float)])

Quite straight forward.

However, for some reason I do not understand I got the following error:

ValueError: could not convert string to float: 'john_smith'

And 'john_smith' is a value from the first column of data (data[:,0]), since I defined this to be a string, I don't even understand why it tries to convert it to a float.

Also: 'john_smith' is neither the first or the last element of the data array.

So, what is wrong here?

Working example:

import numpy as np

row1 = ['julien', '6270', '17', '0.2703992365198028']
row2 = ['john_smith', '2983', '10', '0.3341129301703976']
row3 = ['helo', '19', '0', '0.0']

data = []
data.append(row1)
data.append(row2)
data.append(row3)
data = np.array(data)

data = np.array(data, dtype=[('name', str), ('amount0', float), ('amount1', float), ('amount2', float)])

Upvotes: 1

Views: 4567

Answers (2)

Edwin Pozharski
Edwin Pozharski

Reputation: 31

Data supplied to numpy.array should be a list of tuples, not list of lists. Your example needs two changes - rows are tuples and don't convert data into array.

import numpy as np

row1 = ('julien', '6270', '17', '0.2703992365198028')
row2 = ('john_smith', '2983', '10', '0.3341129301703976')
row3 = ('helo', '19', '0', '0.0')

data = []
data.append(row1)
data.append(row2)
data.append(row3)

data = np.array(data, dtype=[('name', str), ('amount0', float), ('amount1', float), ('amount2', float)])

Upvotes: 1

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96349

You seem to be misunderstanding how structured arrays work. You don't specify the data type of a "column", you specify a the datatype of a structure, and you build an array of structs. Numpy arrays are homogeneous arrays, you cannot have mixed datatypes. So, you could do this:

>>> e1 = ('julien', 6270, 17, 0.2703992365198028)
>>> e2 = ('john_smith', '2983', '10', '0.3341129301703976')
>>> e3 = ('helo', '19', '0', '0.0')
>>> data = [e1, e2, e3]
>>> arr = np.array(data, dtype=[('name', '<U255'), ('amount0', float), ('amount1', float), ('amount2', float)])
>>> arr
array([('julien', 6270.0, 17.0, 0.2703992365198028),
       ('john_smith', 2983.0, 10.0, 0.3341129301703976),
       ('helo', 19.0, 0.0, 0.0)],
      dtype=[('name', '<U255'), ('amount0', '<f8'), ('amount1', '<f8'), ('amount2', '<f8')])
>>>

But notice,

>>> arr.shape
(3,)

There are no columns. Of course, we can just pretend like there were:

>>> arr['name']
array(['julien', 'john_smith', 'helo'],
      dtype='<U255')
>>> arr[0]['name']
'julien'

But honestly, it sounds like you really want a pandas.DataFrame

>>> import pandas as pd
>>> pd.DataFrame(data, columns=['name', 'amount0', 'amount1', 'amount2'])
         name amount0 amount1             amount2
0      julien    6270      17            0.270399
1  john_smith    2983      10  0.3341129301703976
2        helo      19       0                 0.0
>>>

Notice, I had to modify your str datatype to accept unicode, because numpy interprets str as byte-strings. You could always makes your strings bytes objects by encoding them. This is probably the way to go if you are only working with ascii characters.

Upvotes: 2

Related Questions