KcH
KcH

Reputation: 3502

what's happening in this piece of code from documentation?

A Noob question,

I am going through documentation and found this sample example, I could not understand the conditions for AAA,BBB,CCC here in snippet

df:

    AAA  BBB   CCC
0    4  2000  2000
1    5   555   555
2    6   555   555
3    7   555   555

then,

df_mask = pd.DataFrame({'AAA': [True] * 4,
   ...:                         'BBB': [False] * 4,
   ...:                         'CCC': [True, False] * 2})
   ...: 

In [10]: df.where(df_mask, -1000)
Out[10]: 
   AAA   BBB   CCC
0    4 -1000  2000
1    5 -1000 -1000
2    6 -1000   555
3    7 -1000 -1000

May I know a bit of explaination for the above snippet?

Upvotes: 4

Views: 120

Answers (2)

jezrael
jezrael

Reputation: 862731

You can check DataFrame.where:

cond : boolean Series/DataFrame, array-like, or callable
Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

other : scalar, Series/DataFrame, or callable
Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

So it means it replace False value of mask by other, here -1000.

Sample:

df = pd.DataFrame({'AAA': [4, 5, 6, 7], 'BBB': [4, 11, 0, 8], 'CCC': [2000, 45, 555, 85]})
print (df)
   AAA  BBB   CCC
0    4    4  2000
1    5   11    45
2    6    0   555
3    7    8    85

df_mask = pd.DataFrame({'AAA': [True] * 4,
                        'BBB': [False] * 4,
                        'CCC': [True, False] * 2})

print (df.where(df_mask, -1000))
   AAA   BBB   CCC
0    4 -1000  2000
1    5 -1000 -1000
2    6 -1000   555
3    7 -1000 -1000

If no values in other there is replacement to NaNs:

print (df.where(df_mask))
   AAA  BBB     CCC
0    4  NaN  2000.0
1    5  NaN     NaN
2    6  NaN   555.0
3    7  NaN     NaN

You can also pass mask with compare values, e.g.:

print (df.where(df > 10, -1000))
    AAA   BBB   CCC
0 -1000 -1000  2000
1 -1000    11    45
2 -1000 -1000   555
3 -1000 -1000    85

Upvotes: 1

moys
moys

Reputation: 8033

do print(df_mask) You will get the dataframe as below

    AAA     BBB     CCC
0   True    False   True
1   True    False   False
2   True    False   True
3   True    False   False

with df.where(df_mask, -1000), you are replacing False values with -1000 with final out put as below

   AAA   BBB   CCC
0    4 -1000  2000
1    5 -1000 -1000
2    6 -1000   555
3    7 -1000 -1000

Upvotes: 1

Related Questions