Could not convert string to float using dtype in nparray

Question

I have a raw np.array() on which I would explicitly indicate the type of the column.

The data array has the following form:

data = [
    [name1, float1, float2, float3]
    # ... 
    [nameX, floatX, floatX, floatX]
]

Now, to explicitly specify the column type, I do the following:

data = np.array(data, dtype=[('name', str), ('amount0', float), ('amount1', float), ('amount2', float)])

Quite straight forward.

However, for some reason I do not understand I got the following error:

ValueError: could not convert string to float: 'john_smith'

And 'john_smith' is a value from the first column of data (data[:,0]), since I defined this to be a string, I don't even understand why it tries to convert it to a float.

Also: 'john_smith' is neither the first or the last element of the data array.

So, what is wrong here?

Working example:

import numpy as np

row1 = ['julien', '6270', '17', '0.2703992365198028']
row2 = ['john_smith', '2983', '10', '0.3341129301703976']
row3 = ['helo', '19', '0', '0.0']

data = []
data.append(row1)
data.append(row2)
data.append(row3)
data = np.array(data)

data = np.array(data, dtype=[('name', str), ('amount0', float), ('amount1', float), ('amount2', float)])

juanpa.arrivillaga · Accepted Answer

You seem to be misunderstanding how structured arrays work. You don't specify the data type of a "column", you specify a the datatype of a structure, and you build an array of structs. Numpy arrays are homogeneous arrays, you cannot have mixed datatypes. So, you could do this:

>>> e1 = ('julien', 6270, 17, 0.2703992365198028)
>>> e2 = ('john_smith', '2983', '10', '0.3341129301703976')
>>> e3 = ('helo', '19', '0', '0.0')
>>> data = [e1, e2, e3]
>>> arr = np.array(data, dtype=[('name', '>> arr
array([('julien', 6270.0, 17.0, 0.2703992365198028),
       ('john_smith', 2983.0, 10.0, 0.3341129301703976),
       ('helo', 19.0, 0.0, 0.0)],
      dtype=[('name', '>>

But notice,

>>> arr.shape
(3,)

There are no columns. Of course, we can just pretend like there were:

>>> arr['name']
array(['julien', 'john_smith', 'helo'],
      dtype='>> arr[0]['name']
'julien'

But honestly, it sounds like you really want a pandas.DataFrame

>>> import pandas as pd
>>> pd.DataFrame(data, columns=['name', 'amount0', 'amount1', 'amount2'])
         name amount0 amount1             amount2
0      julien    6270      17            0.270399
1  john_smith    2983      10  0.3341129301703976
2        helo      19       0                 0.0
>>>

Notice, I had to modify your str datatype to accept unicode, because numpy interprets str as byte-strings. You could always makes your strings bytes objects by encoding them. This is probably the way to go if you are only working with ascii characters.

Could not convert string to float using dtype in nparray

Answers (2)

Related Questions