Reputation: 543
Consider I have a data frame :
>>> data
c0 c1 c2 _c1 _c2
0 0 1 2 18.0 19.0
1 3 4 5 NaN NaN
2 6 7 8 20.0 21.0
3 9 10 11 NaN NaN
4 12 13 14 NaN NaN
5 15 16 17 NaN NaN
I want to update the values in the c1 and c2 columns with the values in the _c1 and _c2 columns whenever those latter values are not NaN
. Why won't the following work, and what is the correct way to do this?
>>> data.loc[~(data._c1.isna()),['c1','c2']]=data.loc[~(data._c1.isna()),['_c1','_c2']]
>>> data
c0 c1 c2 _c1 _c2
0 0 NaN NaN 18.0 19.0
1 3 4.0 5.0 NaN NaN
2 6 NaN NaN 20.0 21.0
3 9 10.0 11.0 NaN NaN
4 12 13.0 14.0 NaN NaN
5 15 16.0 17.0 NaN NaN
For completeness's sake I want the result to look like
>>> data.loc[~(data._c1.isna()),['c1','c2']]=data.loc[~(data._c1.isna()),['_c1','_c2']]
>>> data
c0 c1 c2 _c1 _c2
0 0 18.0 19.0. 18.0 19.0
1 3 4.0 5.0 NaN NaN
2 6 20.0 21.0 20.0 21.0
3 9 10.0 11.0 NaN NaN
4 12 13.0 14.0 NaN NaN
5 15 16.0 17.0 NaN NaN
Upvotes: 0
Views: 540
Reputation: 120559
You can use np.where
:
df[['c1', 'c2']] = np.where(df[['_c1', '_c2']].notna(),
df[['_c1', '_c2']],
df[['c1', 'c2']])
print(df)
# Output:
c0 c1 c2 _c1 _c2
0 0 18.0 19.0 18.0 19.0
1 3 4.0 5.0 NaN NaN
2 6 20.0 21.0 20.0 21.0
3 9 10.0 11.0 NaN NaN
4 12 13.0 14.0 NaN NaN
5 15 16.0 17.0 NaN NaN
Update
Do you know by any chance WHY the above doesn't work the way I thought it would?
Your column names from left and right side of your expression are different so Pandas can't use this values even if the shape is the same.
# Left side of your expression
>>> data.loc[~(data._c1.isna()),['c1','c2']]
c1 c2 # <- note the column names
0 18.0 19.0
2 20.0 21.0
# Right side of your expression
>>> data.loc[~(data._c1.isna()),['_c1','_c2']]
_c1 _c2 # <- Your column names are difference from left side
0 18.0 19.0
2 20.0 21.0
How to solve it? Simply use .values
on the right side. As your right side is not row/column indexed, Pandas use the shape to set the values.
data.loc[~(data._c1.isna()),['c1','c2']] = \
data.loc[~(data._c1.isna()),['_c1','_c2']].values
print(data)
# Output:
c0 c1 c2 _c1 _c2
0 0 18.0 19.0 18.0 19.0
1 3 4.0 5.0 NaN NaN
2 6 20.0 21.0 20.0 21.0
3 9 10.0 11.0 NaN NaN
4 12 13.0 14.0 NaN NaN
5 15 16.0 17.0 NaN NaN
Upvotes: 1
Reputation: 323376
I recommend update
after rename
df.update(df[['_c1','_c2']].rename(columns={'_c1':'c1','_c2':'c2'}))
df
Out[266]:
c0 c1 c2 _c1 _c2
0 0 18.0 19.0 18.0 19.0
1 3 4.0 5.0 NaN NaN
2 6 20.0 21.0 20.0 21.0
3 9 10.0 11.0 NaN NaN
4 12 13.0 14.0 NaN NaN
5 15 16.0 17.0 NaN NaN
Upvotes: 1