Reputation: 3801
I have a raw np.array()
on which I would explicitly indicate the type of the column.
The data
array has the following form:
data = [
[name1, float1, float2, float3]
# ...
[nameX, floatX, floatX, floatX]
]
Now, to explicitly specify the column type, I do the following:
data = np.array(data, dtype=[('name', str), ('amount0', float), ('amount1', float), ('amount2', float)])
Quite straight forward.
However, for some reason I do not understand I got the following error:
ValueError: could not convert string to float: 'john_smith'
And 'john_smith'
is a value from the first column of data (data[:,0]
), since I defined this to be a string, I don't even understand why it tries to convert it to a float.
Also: 'john_smith'
is neither the first or the last element of the data array.
So, what is wrong here?
Working example:
import numpy as np
row1 = ['julien', '6270', '17', '0.2703992365198028']
row2 = ['john_smith', '2983', '10', '0.3341129301703976']
row3 = ['helo', '19', '0', '0.0']
data = []
data.append(row1)
data.append(row2)
data.append(row3)
data = np.array(data)
data = np.array(data, dtype=[('name', str), ('amount0', float), ('amount1', float), ('amount2', float)])
Upvotes: 1
Views: 4567
Reputation: 31
Data supplied to numpy.array should be a list of tuples, not list of lists. Your example needs two changes - rows are tuples and don't convert data into array.
import numpy as np
row1 = ('julien', '6270', '17', '0.2703992365198028')
row2 = ('john_smith', '2983', '10', '0.3341129301703976')
row3 = ('helo', '19', '0', '0.0')
data = []
data.append(row1)
data.append(row2)
data.append(row3)
data = np.array(data, dtype=[('name', str), ('amount0', float), ('amount1', float), ('amount2', float)])
Upvotes: 1
Reputation: 96349
You seem to be misunderstanding how structured arrays work. You don't specify the data type of a "column", you specify a the datatype of a structure, and you build an array of structs. Numpy
arrays are homogeneous arrays, you cannot have mixed datatypes. So, you could do this:
>>> e1 = ('julien', 6270, 17, 0.2703992365198028)
>>> e2 = ('john_smith', '2983', '10', '0.3341129301703976')
>>> e3 = ('helo', '19', '0', '0.0')
>>> data = [e1, e2, e3]
>>> arr = np.array(data, dtype=[('name', '<U255'), ('amount0', float), ('amount1', float), ('amount2', float)])
>>> arr
array([('julien', 6270.0, 17.0, 0.2703992365198028),
('john_smith', 2983.0, 10.0, 0.3341129301703976),
('helo', 19.0, 0.0, 0.0)],
dtype=[('name', '<U255'), ('amount0', '<f8'), ('amount1', '<f8'), ('amount2', '<f8')])
>>>
But notice,
>>> arr.shape
(3,)
There are no columns. Of course, we can just pretend like there were:
>>> arr['name']
array(['julien', 'john_smith', 'helo'],
dtype='<U255')
>>> arr[0]['name']
'julien'
But honestly, it sounds like you really want a pandas.DataFrame
>>> import pandas as pd
>>> pd.DataFrame(data, columns=['name', 'amount0', 'amount1', 'amount2'])
name amount0 amount1 amount2
0 julien 6270 17 0.270399
1 john_smith 2983 10 0.3341129301703976
2 helo 19 0 0.0
>>>
Notice, I had to modify your str
datatype to accept unicode
, because numpy
interprets str
as byte-strings. You could always makes your strings bytes
objects by encoding them. This is probably the way to go if you are only working with ascii characters.
Upvotes: 2