Amlesh Kanekar
Amlesh Kanekar

Reputation: 47

Python ndarray with elements of different types

I wanted to create an array to hold mixed types - string and int.

The following code did not work as desired - all elements got typed as String.

>>> a=numpy.array(["Str",1,2,3,4])
>>> print a
['Str' '1' '2' '3' '4']
>>> print type(a[0]),type(a[1])
<type 'numpy.string_'> <type 'numpy.string_'>

All elements of the array were typed as 'numpy.string_'

But, oddly enough, if I pass one of the elements as "None", the types turn out as desired:

>>> a=numpy.array(["Str",None,2,3,4])
>>> print a
['Str' None 2 3 4]
>>> print type(a[0]),type(a[1]),type(a[2])
<type 'str'> <type 'NoneType'> <type 'int'>

Thus, including a "None" element provides me with a workaround, but I am wondering why this should be the case. Even if I don't pass one of the elements as None, shouldn't the elements be typed as they are passed?

Upvotes: 3

Views: 798

Answers (2)

hpaulj
hpaulj

Reputation: 231665

An alternative to adding the None is to make the dtype explicit:

In [80]: np.array(["str",1,2,3,4])
Out[80]: array(['str', '1', '2', '3', '4'], dtype='<U3')
In [81]: np.array(["str",1,2,3,4], dtype=object)
Out[81]: array(['str', 1, 2, 3, 4], dtype=object)

Creating a object dtype array and filling it from a list is another option:

In [85]: res = np.empty(5, object)
In [86]: res
Out[86]: array([None, None, None, None, None], dtype=object)
In [87]: res[:] = ['str', 1, 2, 3, 4]
In [88]: res
Out[88]: array(['str', 1, 2, 3, 4], dtype=object)

Here it isn't needed, but it matters when you want an array of lists.

Upvotes: 1

jpp
jpp

Reputation: 164803

Mixed types in NumPy is strongly discouraged. You lose the benefits of vectorised computations. In this instance:

  • For your first array, NumPy makes the decision to convert your array to a uniform array of strings of 3 or less characters.
  • For your second array, None is not permitted as a "stringable" variable in NumPy, so NumPy uses the standard object dtype. object dtype represents a collection of pointers to arbitrary types.

You can see this when you print the dtype attributes of your arrays:

print(np.array(["Str",1,2,3,4]).dtype)     # <U3
print(np.array(["Str",None,2,3,4]).dtype)  # object

This should be entirely expected. NumPy has a strong preference for homogenous types, as indeed you should have for any meaningful computations. Otherwise, Python list may be a more appropriate data structure.

For a more detailed descriptions of how NumPy prioritises dtype choice, see:

Upvotes: 2

Related Questions