Priyash Maini
Priyash Maini

Reputation: 25

How to handle mixed data types in numpy arrays

Stuck in this Numpy Problem

country=['India','USA']
​gdp=[22,33]

import numpy as np
a=np.column_stack((country,gdp))

array([['India', '22'],
       ['USA', '33']], dtype='<U11')

I have an NDArray and I want to find the maximum of the 2nd column. I tried the below

print(a.max(axis=1)[1])
print(a[:,1].max())

It threw this error: TypeError: cannot perform reduce with flexible type

Tried converting the type

datatype=([('country',np.str_,64),('gross',np.float32)])

new=np.array(a,dtype=datatype)

But got the below error

could not convert string to float: 'India'.

Upvotes: 2

Views: 8176

Answers (2)

jpp
jpp

Reputation: 164623

Consider using numpy structured arrays for mixed types. You will have no issues if you explicitly set data types.

This is often necessary, and certainly advisable, with numpy.

import numpy as np

country = ['India','USA','UK']
gdp = [22,33,4]

a = np.array(list(zip(country, gdp)),
             dtype=[('Country', '|S11'), ('Number', '<i8')])

res_asc = np.sort(a, order='Number')

# array([(b'UK', 4), (b'India', 22), (b'USA', 33)], 
#       dtype=[('Country', 'S11'), ('Number', '<i8')])

res_desc = np.sort(a, order='Number')[::-1]

# array([(b'USA', 33), (b'India', 22), (b'UK', 4)], 
#       dtype=[('Country', 'S11'), ('Number', '<i8')])

Upvotes: 0

swathis
swathis

Reputation: 366

The error is due to the string data in your array, which makes the dtype to be Unicode(indicated by U11 i.e., 11-character unicode) string. If you wish to store data in the numerical format, then use structured arrays. However, if you only wish to compute the maximum of the numerical column, use

print(a[:, 1].astype(np.int).max())
// 33

You may choose to use other numerical dtypes such as np.float inplace of np.int based on the nature of data in the specific column.

Upvotes: 2

Related Questions