nbk
nbk

Reputation: 543

Setting Values with pandas DataFrame.loc

Consider I have a data frame :

>>> data
   c0  c1  c2   _c1   _c2
0   0   1   2  18.0  19.0
1   3   4   5   NaN   NaN
2   6   7   8  20.0  21.0
3   9  10  11   NaN   NaN
4  12  13  14   NaN   NaN
5  15  16  17   NaN   NaN

I want to update the values in the c1 and c2 columns with the values in the _c1 and _c2 columns whenever those latter values are not NaN. Why won't the following work, and what is the correct way to do this?

>>> data.loc[~(data._c1.isna()),['c1','c2']]=data.loc[~(data._c1.isna()),['_c1','_c2']]
>>> data
   c0    c1    c2   _c1   _c2
0   0   NaN   NaN  18.0  19.0
1   3   4.0   5.0   NaN   NaN
2   6   NaN   NaN  20.0  21.0
3   9  10.0  11.0   NaN   NaN
4  12  13.0  14.0   NaN   NaN
5  15  16.0  17.0   NaN   NaN

For completeness's sake I want the result to look like

>>> data.loc[~(data._c1.isna()),['c1','c2']]=data.loc[~(data._c1.isna()),['_c1','_c2']]
>>> data
   c0    c1    c2   _c1   _c2
0   0  18.0  19.0. 18.0  19.0
1   3   4.0   5.0   NaN   NaN
2   6  20.0  21.0  20.0  21.0
3   9  10.0  11.0   NaN   NaN
4  12  13.0  14.0   NaN   NaN
5  15  16.0  17.0   NaN   NaN

Upvotes: 0

Views: 540

Answers (2)

Corralien
Corralien

Reputation: 120559

You can use np.where:

df[['c1', 'c2']] = np.where(df[['_c1', '_c2']].notna(),
                            df[['_c1', '_c2']],
                            df[['c1', 'c2']])
print(df)

# Output:
   c0    c1    c2   _c1   _c2
0   0  18.0  19.0  18.0  19.0
1   3   4.0   5.0   NaN   NaN
2   6  20.0  21.0  20.0  21.0
3   9  10.0  11.0   NaN   NaN
4  12  13.0  14.0   NaN   NaN
5  15  16.0  17.0   NaN   NaN

Update

Do you know by any chance WHY the above doesn't work the way I thought it would?

Your column names from left and right side of your expression are different so Pandas can't use this values even if the shape is the same.

# Left side of your expression
>>> data.loc[~(data._c1.isna()),['c1','c2']]
     c1    c2  # <- note the column names
0  18.0  19.0
2  20.0  21.0

# Right side of your expression
>>> data.loc[~(data._c1.isna()),['_c1','_c2']]
    _c1   _c2  # <- Your column names are difference from left side
0  18.0  19.0
2  20.0  21.0

How to solve it? Simply use .values on the right side. As your right side is not row/column indexed, Pandas use the shape to set the values.

data.loc[~(data._c1.isna()),['c1','c2']] = \
    data.loc[~(data._c1.isna()),['_c1','_c2']].values
print(data)

# Output:
   c0    c1    c2   _c1   _c2
0   0  18.0  19.0  18.0  19.0
1   3   4.0   5.0   NaN   NaN
2   6  20.0  21.0  20.0  21.0
3   9  10.0  11.0   NaN   NaN
4  12  13.0  14.0   NaN   NaN
5  15  16.0  17.0   NaN   NaN

Upvotes: 1

BENY
BENY

Reputation: 323376

I recommend update after rename

df.update(df[['_c1','_c2']].rename(columns={'_c1':'c1','_c2':'c2'}))
df
Out[266]: 
   c0    c1    c2   _c1   _c2
0   0  18.0  19.0  18.0  19.0
1   3   4.0   5.0   NaN   NaN
2   6  20.0  21.0  20.0  21.0
3   9  10.0  11.0   NaN   NaN
4  12  13.0  14.0   NaN   NaN
5  15  16.0  17.0   NaN   NaN

Upvotes: 1

Related Questions