Luis Figueiredo
Luis Figueiredo

Reputation: 35

datatype conflicts - Strings and Floats in one NumpyArray

I have the following arrays:

a = ['(0.0 | 0.0 | 0.0)', '(0.0 | 0.0 | 0.1)'] # strings
b = [0.0, 0.1] # floats
c = [0.0, 0.2] # floats
d = [0.0, 0.3] # floats
e = [0.0, 0.4] # floats

My goal is to create a final 2d array, such that the datatypes are preserved, with numpy:

final = [a, b, c, d, e] -> [ ['(0.0 | 0.0 | 0.0)', ...] , [0.0, 0.1], ... ]

When I tried to do this with

np.array([a, b, c, d, e])

what happens is that the floats are converted to strings. Naturally, I went to look at the dtype documentation from numpy dtype doc and tried to create my own personal dtype through

dt = np.dtype([('f1', np.str), ('f2', np.float), ('f3', np.float), ('f4', np.float), ('f5', np.float)])
final = np.array([a, b, c, d, e], dtype=dt)

However it's trying to convert the string array into floats:

ValueError: could not convert string to float: '(0.0 | 0.0 | 0.0)'

Does anyone know what I'm doing wrong? This should be possible...

Upvotes: 0

Views: 613

Answers (1)

hpaulj
hpaulj

Reputation: 231728

In [256]: a = ['(0.0 | 0.0 | 0.0)', '(0.0 | 0.0 | 0.1)'] # strings
     ...: b = [0.0, 0.1] # floats
     ...: c = [0.0, 0.2] # floats
     ...: d = [0.0, 0.3] # floats
     ...: e = [0.0, 0.4] # floats
     ...: 
     ...: 

In [267]: dt = np.dtype([('f1', 'U20'), ('f2', np.float), ('f3', np.float), ('f4
     ...: ', np.float), ('f5', np.float)])

A structured array has to initialized with a list of tuples:

In [271]: [x for x in zip(a,b,c,d,e)]
Out[271]: 
[('(0.0 | 0.0 | 0.0)', 0.0, 0.0, 0.0, 0.0),
 ('(0.0 | 0.0 | 0.1)', 0.1, 0.2, 0.3, 0.4)]

In [273]: np.array([x for x in zip(a,b,c,d,e)],dtype=dt)
Out[273]: 
array([('(0.0 | 0.0 | 0.0)', 0. , 0. , 0. , 0. ),
       ('(0.0 | 0.0 | 0.1)', 0.1, 0.2, 0.3, 0.4)],
      dtype=[('f1', '<U20'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8')])

Or filled field by field:

In [268]: arr = np.empty(2, dtype=dt)
In [269]: for n, x in zip(arr.dtype.names, [a,b,c,d,e]):
     ...:     arr[n] = np.array(x)
     ...:     
In [270]: arr
Out[270]: 
array([('(0.0 | 0.0 | 0.0)', 0. , 0. , 0. , 0. ),
       ('(0.0 | 0.0 | 0.1)', 0.1, 0.2, 0.3, 0.4)],
      dtype=[('f1', '<U20'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8')])

Such an array can be accessed by field name or record number:

In [274]: arr['f1']
Out[274]: array(['(0.0 | 0.0 | 0.0)', '(0.0 | 0.0 | 0.1)'], dtype='<U20')
In [276]: arr['f3']
Out[276]: array([0. , 0.2])
In [277]: arr[0]
Out[277]: ('(0.0 | 0.0 | 0.0)', 0., 0., 0., 0.)

It is a 1d array, not 2d.

Another option is an object dtype array:

In [278]: np.array([a,b,c,d,e], dtype=object)
Out[278]: 
array([['(0.0 | 0.0 | 0.0)', '(0.0 | 0.0 | 0.1)'],
       [0.0, 0.1],
       [0.0, 0.2],
       [0.0, 0.3],
       [0.0, 0.4]], dtype=object)
In [279]: _.shape
Out[279]: (5, 2)

Upvotes: 1

Related Questions