Diego F Medina
Diego F Medina

Reputation: 499

Different type of panda Series elements, numpy ints, during list comprehension

I have noticed that in numpy 1.18.4 (and not in previous numpy versions) the element type during list comprehensions is different than accessing element-wise. For example:

foo = pd.DataFrame(data={'a': np.array([1, 2, 3]), 'b': np.array([1, 0, 1])})
var = {type(x) == type(foo['a'][i]) for i, x in enumerate(foo['a'])}

I get var = {False}. What is the reason for this? Why was it not the case before?


Ideally I would like to avoid ZeroDivisionError when dividing by zero but instead get the usual 'inf' produced by numpy.int32, when doing:

[0 if x == 0 and z == 0 else x / y for x, y, z in zip(foo['a'], foo['b'], c)]

for c another array of int32's. Is there any way to do this without re transforming the elements to np.int32 inside the list comprehension?

Upvotes: 0

Views: 51

Answers (1)

Ben.T
Ben.T

Reputation: 29635

IIUC what you want, you can use to_numpy on the columns from foo.

foo = pd.DataFrame(data={'a':np.array([0,2,3]), 'b': np.array([1,0,1])})
c = np.array([0,1,1])

[0 if x == 0 and z == 0 else x / y 
 for x, y, z in zip(foo['a'].to_numpy(), foo['b'].to_numpy(), c)]
# [0, inf, 3.0]

It works although it raises this RuntimeWarning: divide by zero encountered in long_scalars

Another alternative is to specify a pandas type likepd.Int32Dtype when creating foo:

foo = pd.DataFrame(data={'a':np.array([0,2,3]), 'b': np.array([1,0,1])}, 
                   dtype=pd.Int32Dtype())
# or if foo exsit already you use astype with 
# foo = foo.astype(pd.Int32Dtype())

c = np.array([0,1,1])
[0 if x == 0 and z == 0 else x / y for x, y, z in zip(foo['a'], foo['b'], c)]

same result

Upvotes: 1

Related Questions