Reputation: 3123
a b c x a1 b1 c1 x1
0 aa ba ca 9 NaN NaN NaN 1
1 ab bb cb 9 NaN NaN NaN NaN
2 ac bc cd NaN NaN NaN NaN NaN
3 ad bd cd 9 1 NaN NaN NaN
4 ae be ce 9 NaN 2 NaN 12
5 af bf cf 9 NaN NaN 3 14
6 ag bg cg 9 3 NaN 1 45
a b c x a1 b1 c1 x1
0 aa ba ca 9 NaN NaN NaN 1
1 ab bb cb 9 NaN NaN NaN 9
2 ac bc cc NaN NaN NaN NaN NaN
3 ad bd cd 9 1 NaN NaN NaN
4 ae be ce 9 NaN 2 NaN 12
5 af bf cf 9 NaN NaN 3 14
6 ag bg cg 9 3 NaN 1 45
row 0: I don't want the replacement to happen, when there's already a value in x1
, and even when all of the a1
, b1
and c1
are NaN
s.
rows 1 & 2: I want to fill column x1
with whatever is in x
, only and only when columns a1
, b1
and c1
are NaN
s.
rows 3, 4, 5 & 6: I want the column x1
to stay as is, when any one or two columns from a1
, b1
and c1
are NaN
s.
Is there an efficient way to do this operation?
I have tried as shown in this and this stackoverflow answers, but I get broadcast error, presumably because I am trying to do the check on multiple columns.
Here's what I have tried:
np.where(np.isnan(df[['a1', 'b1', 'c1']].values), df['x'].values, df['x1'].values)
and my error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-32-bb6a0f9faf18> in <module>
----> 1 np.where(np.isnan(df[['a', 'b', 'c']].values), df['x'].values, df['x1'].values)
<__array_function__ internals> in where(*args, **kwargs)
ValueError: operands could not be broadcast together with shapes (176213,6) (176213,) (176213,)
Upvotes: 1
Views: 820
Reputation: 35626
One option is to use any
on axis=1:
df['x1'] = np.where(df[['a1', 'b1', 'c1', 'x1']].any(axis=1), df['x1'], df['x'])
df
:
a b c x a1 b1 c1 x1
0 aa ba ca 9.0 NaN NaN NaN 1.0
1 ab bb cb 9.0 NaN NaN NaN 9.0
2 ac bc cd NaN NaN NaN NaN NaN
3 ad bd cd 9.0 1.0 NaN NaN NaN
4 ae be ce 9.0 NaN 2.0 NaN 12.0
5 af bf cf 9.0 NaN NaN 3.0 14.0
6 ag bg cg 9.0 3.0 NaN 1.0 45.0
Another option via @Jon Clements♦:
df.loc[~df[['a1', 'b1', 'c1', 'x1']].any(1), 'x1'] = df['x']
The reason for negating any
instead of using all
is that any
will consider NaN
as falsey, while all
considers NaN
as truthy:
pd.Series([np.nan, np.nan]).any() # False
pd.Series([np.nan, np.nan]).all() # True
Upvotes: 2