scripton
scripton

Reputation: 27

Ndarray of lists with mix of floats and integers?

I have an array of lists (corr: N-Dimensional array)

s_cluster_data
Out[410]: 
array([[ 0.9607611 ,  0.19538569,  0.        ],
       [ 1.03990463,  0.22274072,  0.        ],
       [ 1.09430461,  0.22603228,  0.        ],
       ...,
       [ 1.10802461, -0.54190659,  2.        ],
       [ 0.9288097 , -0.49195368,  2.        ],
       [ 0.81606986, -0.47141286,  2.        ]])

I would like to make the third column an integer. I've tried to assign dtype as such

dtype=[('A','f8'),('B','f8'),('C','i4')]

s_cluster_data = np.array(s_cluster_data, dtype=dtype)
s_cluster_data

Out[414]: 
array([[( 0.9607611 ,  0.9607611 , 0), ( 0.19538569,  0.19538569, 0),
        ( 0.        ,  0.        , 0)],
       [( 1.03990463,  1.03990463, 1), ( 0.22274072,  0.22274072, 0),
        ( 0.        ,  0.        , 0)],
       [( 1.09430461,  1.09430461, 1), ( 0.22603228,  0.22603228, 0),
        ( 0.        ,  0.        , 0)],
       ...,
       dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])

Which creates an array of lists of tuples (corr: array with dtype), with each index in lists becoming a separate tuple.

I've also tried to take apart the array, read it in as array of tuples, but return back to original state.

list_cluster = s_cluster_data.tolist() # py list
tuple_cluster = [tuple(l) for l in list_cluster] # list of tuples

dtype=[('A','f8'),('B','f8'),('C','i4')]
sd_cluster_data = np.array(tuple_cluster, dtype=dtype) # array of tuples with dtype
sd_cluster_data

Out:   ...,
       (1.0020371 , -0.56034073, 2), (1.18264038, -0.55773913, 2),
       (1.00550194, -0.55359672, 2), (1.10802461, -0.54190659, 2),
       (0.9288097 , -0.49195368, 2), (0.81606986, -0.47141286, 2)],
      dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])

So ideally the above output is what I would like to see, but with array of lists, not array of tuples. I tried to take the array apart and merge it back as lists

x_val_arr = np.array([x[0] for x in sd_cluster_data])
y_val_arr = np.array([x[1] for x in sd_cluster_data])
cluster_id_arr = np.array([x[2] for x in sd_cluster_data])

coordinates_arr = np.stack((x_val_arr,y_val_arr,cluster_id_arr),axis=1)

But once again I get floats in the third column

coordinates_arr
Out[416]: 
array([[ 0.9607611 ,  0.19538569,  0.        ],
       [ 1.03990463,  0.22274072,  0.        ],
       [ 1.09430461,  0.22603228,  0.        ],
       ...,
       [ 1.10802461, -0.54190659,  2.        ],
       [ 0.9288097 , -0.49195368,  2.        ],
       [ 0.81606986, -0.47141286,  2.        ]])

So this is probably a question due to my lack of domain knowledge, but do ndarrays not support mixed data types if it consists of lists, not tuples?

Upvotes: 0

Views: 91

Answers (2)

hpaulj
hpaulj

Reputation: 231595

In [87]: import numpy.lib.recfunctions as rf                                    
In [88]: arr = np.array([[ 0.9607611 ,  0.19538569,  0.        ], 
    ...:        [ 1.03990463,  0.22274072,  0.        ], 
    ...:        [ 1.09430461,  0.22603228,  0.        ], 
    ...:        [ 1.10802461, -0.54190659,  2.        ], 
    ...:        [ 0.9288097 , -0.49195368,  2.        ], 
    ...:        [ 0.81606986, -0.47141286,  2.        ]])         
In [89]: arr                                                                    
Out[89]: 
array([[ 0.9607611 ,  0.19538569,  0.        ],
       [ 1.03990463,  0.22274072,  0.        ],
       [ 1.09430461,  0.22603228,  0.        ],
       [ 1.10802461, -0.54190659,  2.        ],
       [ 0.9288097 , -0.49195368,  2.        ],
       [ 0.81606986, -0.47141286,  2.        ]])

There are various ways of constructing a structured array from 2d array like this. Recent versions provide a convenient unstructured_to_structured function:

In [90]: dt = np.dtype([('A','f8'),('B','f8'),('C','i4')])     
In [92]: rf.unstructured_to_structured(arr, dt)                                 
Out[92]: 
array([(0.9607611 ,  0.19538569, 0), (1.03990463,  0.22274072, 0),
       (1.09430461,  0.22603228, 0), (1.10802461, -0.54190659, 2),
       (0.9288097 , -0.49195368, 2), (0.81606986, -0.47141286, 2)],
      dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])

Each row of arr has been turned into a structured record, displayed as a tuple.

A functionally equivalent approach is to create a 'blank' array, and assign field values by name:

In [93]: res = np.zeros(arr.shape[0], dt)                                       
In [94]: res                                                                    
Out[94]: 
array([(0., 0., 0), (0., 0., 0), (0., 0., 0), (0., 0., 0), (0., 0., 0),
       (0., 0., 0)], dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])
In [95]: res['A'] = arr[:,0]                                                    
In [96]: res['B'] = arr[:,1]                                                    
In [97]: res['C'] = arr[:,2]                                                    
In [98]: res                                                                    
Out[98]: 
array([(0.9607611 ,  0.19538569, 0), (1.03990463,  0.22274072, 0),
       (1.09430461,  0.22603228, 0), (1.10802461, -0.54190659, 2),
       (0.9288097 , -0.49195368, 2), (0.81606986, -0.47141286, 2)],
      dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])

and to belabor the point, we could also make the structured array from a list of tuples:

In [104]: np.array([tuple(row) for row in arr.tolist()], dt)                    
Out[104]: 
array([(0.9607611 ,  0.19538569, 0), (1.03990463,  0.22274072, 0),
       (1.09430461,  0.22603228, 0), (1.10802461, -0.54190659, 2),
       (0.9288097 , -0.49195368, 2), (0.81606986, -0.47141286, 2)],
      dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])

Upvotes: 1

maciek97x
maciek97x

Reputation: 7330

The problem might be in the way you pass data to np.array. The rows of array should be tuples.

 a = np.array([( 0.9607611 ,  0.19538569,  0.        )], dtype='f8, f8, i4')

will create an array

array([(0.9607611, 0.19538569, 0)],
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<i4')])

Upvotes: 1

Related Questions