Reputation: 6606
I'm trying to implement a function that returns the max at each position of a dataframe or series, minimizing NaN.
In [217]: a
Out[217]:
0 1
0 4 1
1 6 0
[2 rows x 2 columns]
In [218]: b
Out[218]:
0 1
0 NaN 3
1 3 NaN
[2 rows x 2 columns]
In [219]: do_not_replace = b.isnull() | (a > b)
In [220]: do_not_replace
Out[220]:
0 1
0 True False
1 True True
[2 rows x 2 columns]
In [221]: a.where(do_not_replace, b)
Out[221]:
0 1
0 4 3
1 1 0
[2 rows x 2 columns]
In [222]: expected
Out[222]:
0 1
0 4 3
1 6 0
[2 rows x 2 columns]
In [223]: pd.__version__
Out[223]: '0.13.1'
I imagine there are other ways to implement this function, but I'm unable to figure out this behavior. I mean, where is that 1 coming from? I think the logic is sound. Am I misinterpreting how the function works?
Upvotes: 2
Views: 378
Reputation: 128918
This is essentially what where
does internally. I think this might be a transpositional bug. Bug fixed here. Turns out a symmetric DataFrame AND a passed frame where required to reproduce. Very subtle. Note that this other form of indexing (below) uses a different method that's inplace so it was ok.
In [56]: a[~do_not_replace] = b
In [57]: a
Out[57]:
0 1
0 4 3
1 6 0
Note: this has been fixed in master/0.14.1.
Upvotes: 5
Reputation: 79732
I can't reproduce this problem with "plain" numpy
arrays:
import numpy as np
a=array([(4,1),(6,0)])
b=array([(np.NaN,3),(3,np.NaN)])
print a
print b
do_not_replace = np.isnan(b) | (a>b)
print do_not_replace
print np.where(do_not_replace, a, b)
... gives what you want, I think:
array([[ 4., 3.],
[ 6., 0.]])
@jwilner: As @Jeff suggests, it could be a pandas
bug. What version are you running?
Upvotes: 1