Kevin
Kevin

Reputation: 3358

Numpy empty list type inference

Why is the empty list [] being inferred as float type when using np.append?

np.append([1,2,3], [0])
# output: array([1, 2, 3, 0]), dtype = np.int64

np.append([1,2,3], [])
# output: array([1., 2., 3.]), dtype = np.float64

This is persistent even when using a np.array([1,2,3], dtype=np.int32) as arr.

It's not possible to specify a dtype for append, so I am just curious on why this happens. Numpy's concatenate does the same thing, but when I try to specify the dtype I get an error:

np.concatenate([[1,2,3], []], dtype=np.int64)

Error:

TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'same_kind'

But finally if I set the unsafe casting rule it works:

np.concatenate([[1,2,3], []], dtype=np.int64, casting='unsafe')

Why is [] considered a float?

Upvotes: 9

Views: 1028

Answers (2)

hpaulj
hpaulj

Reputation: 231385

Look at the code for np.append (via docs link or ipython):

def append(arr, values, axis=None):
    arr = asanyarray(arr)
    if axis is None:
        if arr.ndim != 1:
            arr = arr.ravel()
        values = ravel(values)
        axis = arr.ndim-1
    return concatenate((arr, values), axis=axis)

The first argument is turned into an array, if it isn't one already.

You don't specify the axis, so both arr and values are ravelled - turned into 1d array. np.ravel is also python code, and does asanyarray(a).ravel(order=order)

So the dtype inference is done by np.asanyarray.

The rest of the action is np.concatenate. It too will convert the inputs to arrays if necessary. The result dtype is the "highest" of the inputs.

np.append is a poorly conceived (IMO) alternative way of using np.concatenate. It is not a list append clone.

Also be careful about "empty" arrays:

In [73]: np.array([])
Out[73]: array([], dtype=float64)
In [74]: np.empty((0))
Out[74]: array([], dtype=float64)
In [75]: np.empty((0),int)
Out[75]: array([], dtype=int64)

The common list idiom

alist = []
for i in range(10):
    alist.append(i)

does not translate well into numpy. Build a list of arrays, and do one concatenate/vstack at the end. Don't iterate over "empty" arrays, however created.

Upvotes: 1

Jérôme Richard
Jérôme Richard

Reputation: 50488

np.append is subject to well-defined semantic rules like any Numpy binary operation. As a result, it first converts the input operands to Numpy arrays if this is not the case (typically with np.array) and then apply the semantic rules to find the type of the resulting array and check it is a valid operation before applying the actual operation (here the concatenation). The array type returned by np.array is "determined as the minimum type required to hold the objects in the sequence" regarding to the documentation. When the list is empty, like in your case, the default type is numpy.float64 as stated in the documentation of np.empty. This arbitrary choice was made long ago and has not been changed since in order not to break old codes. Please note that It seems not all Numpy developers agree with the current choice and so this is a matter of debate. For more information, you can read this opened issue.

The rule of thumb is to use either existing Numpy arrays or to perform an explicit conversion to a Numpy array using np.array with a fixed dtype parameter (as described in the above comments).

Upvotes: 6

Related Questions