jakes
jakes

Reputation: 2085

How can we replace NaNs in only specific rows?

I need to replace NaN values only in specific rows of my data frame. My original looks like this:

import pandas as pd
import numpy as np
pd.DataFrame.from_dict({'col1': 'A B A D E B C D E A'.split(), 
                        'col2': [np.nan, 0, 1, np.nan, np.nan, np.nan, 1, np.nan, np.nan, 0]})

If col1 is equal to A or B and col2 is NaN, then I want to replace it with 0. For other values of col1, the NaNs should remain unchanged.

My first idea was to use pd.DataFrame.replace and write something like this:

data.loc[data['col1'].isin(['A', 'B']), 'col2'].replace({np.nan: 0}) 

but it doesn't seem to work as intended and fills all NaNs with 0.

Upvotes: 2

Views: 61

Answers (4)

Alexander
Alexander

Reputation: 109546

I believe you intended to use a mask.

mask = data['col1'].isin({'A', 'B'})
data.loc[mask, 'col2'] = data.loc[mask, 'col2'].replace({np.nan, 0})
>>> data
  col1  col2
0    A   NaN
1    B   0.0
2    A   1.0
3    D   NaN
4    E   NaN
5    B   NaN
6    C   1.0
7    D   NaN
8    E   NaN
9    A   0.0

Upvotes: 2

the_RR
the_RR

Reputation: 392

data:

    col1    col2
0   A   NaN
1   B   0.0
2   A   1.0
3   D   NaN
4   E   NaN
5   B   NaN
6   C   1.0
7   D   NaN
8   E   NaN
9   A   0.0

script:

mask1 = data['col1'].isin(['A', 'B'])
mask2 = data['col2'].isna()
mask = mask1 & mask2 
data.loc[mask, 'col2'] = 0

output:

    col1    col2
0   A   0.0
1   B   0.0
2   A   1.0
3   D   NaN
4   E   NaN
5   B   0.0
6   C   1.0
7   D   NaN
8   E   NaN
9   A   0.0

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153460

You can use fillna instead of replace:

df['col2'] = df.loc[df['col1'].isin(['A','B']), 'col2'].fillna(0)
df

Output:

|    | col1   |   col2 |
|---:|:-------|-------:|
|  0 | A      |      0 |
|  1 | B      |      0 |
|  2 | A      |      1 |
|  3 | D      |    nan |
|  4 | E      |    nan |
|  5 | B      |      0 |
|  6 | C      |    nan |
|  7 | D      |    nan |
|  8 | E      |    nan |
|  9 | A      |      0 |

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

You can just do a loc:

df.loc[df.col1.isin(['A','B']) & df.col2.isna(), 'col2'] = 0

Upvotes: 2

Related Questions