Make NA based on condition in Pandas DF

Question

I feel like this probably has a simple solution, I just can't figure it out.

I have a Pandas DF similar to this MWE:

In [92]: test_df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9], 'B':[9,8,7,6,5,4,3,2,1]})

In [93]: test_df
Out[93]: 
   A  B
0  1  9
1  2  8
2  3  7
3  4  6
4  5  5
5  6  4
6  7  3
7  8  2
8  9  1

What I want is to set all values in that df that are less than 4 to be np.nan. I can get a df of booleans for this criteria:

In [94]: test_df < 4
Out[94]: 
       A      B
0   True  False
1   True  False
2   True  False
3  False  False
4  False  False
5  False  False
6  False   True
7  False   True
8  False   True

But I don't know the final step to make those True values np.nan. I thought this could be achieved with test_df.loc but I wasn't successful in my attempts.

jezrael · Accepted Answer

Use DataFrame.mask, by default True values of boolean mask are replaced by NaN:

print (test_df.mask(test_df < 4))
     A    B
0  NaN  9.0
1  NaN  8.0
2  NaN  7.0
3  4.0  6.0
4  5.0  5.0
5  6.0  4.0
6  7.0  NaN
7  8.0  NaN
8  9.0  NaN

Another solution is invert condition and simple assign:

test_df = test_df[test_df >= 4]
print (test_df)
     A    B
0  NaN  9.0
1  NaN  8.0
2  NaN  7.0
3  4.0  6.0
4  5.0  5.0
5  6.0  4.0
6  7.0  NaN
7  8.0  NaN
8  9.0  NaN

Make NA based on condition in Pandas DF

Answers (2)

Related Questions