jakes
jakes

Reputation: 2085

How to use Pandas mask method on the part of Data Frame

With the data frame constructed as below, but with many more objects (structured as object1 columns) I want to remove (replace with np.nan) all values of var1, var2, var3 when there are smaller than 0 or greater than 100.

example_df = {('meta_info', 'time'): {0: 2100, 1: 2200, 2: 2300, 3: 2400, 4: 100},
 ('meta_info', 'counter'): {0: 0.0, 1: 1.0, 2: 2.0, 3: 3.0, 4: 4.0},
 ('meta_info', 'measurement_id'): {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
 ('object1', 'grp'): {0: '0', 1: '0', 2: '0', 3: '0', 4: '0'},
 ('object1', 'id'): {0: '376690',
  1: '376690',
  2: '376690',
  3: '376690',
  4: '376690'},
 ('object1', 'var1'): {0: 34.405149821218195,
  1: 25.047388024508773,
  2: 94.12283547956514,
  3: -38.34383022173205,
  4: 60.15259222044418},
 ('object1', 'var2'): {0: 40.470001220703125,
  1: 40.369998931884766,
  2: 40.277000427246094,
  3: 40.18899917602539,
  4: 40.10200119018555},
 ('object', 'var1'): {0: -4.453429468309658,
  1: 82.84217089703611,
  2: 145.2084949734712,
  3: 79.83440766416545,
  4: 87.39526160763526},
 ('object', 'var2'): {0: 34.0,
  1: 33.70000076293945,
  2: 33.900001525878906,
  3: 34.0,
  4: 34.0},
 ('object', 'var3'): {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}}

example_df = pd.DataFrame(example_df)

I was trying to use pd.DataFrame.mask here, but following code fails:

idx = pd.IndexSlice
z = example_df.loc[:, idx[:, ['var1', 'var2', 'var3']]]
example_df.mask((z < 0) | (z > 100))

as it removes all values of other columns than those within z.

This is confusing as documentation says:

cond : boolean Series/DataFrame, array-like, or callable Where cond is False, keep the original value. Where True, replace with corresponding value from other.

so I assume that first 5 columns, which are not part of z view are not checked - otherwise only (object1, id) would be replaced with NaN as it is the only columns with values outside the range. Why those columns are filled with NaN then? Is there any other method that would allow me to check selected part of the DataFrame at once?

Upvotes: 0

Views: 184

Answers (1)

jezrael
jezrael

Reputation: 862431

I think you need replace filtered DataFrame z, because boolean mask is from filtered DataFrame:

print (z.mask((z < 0) | (z > 100)))
     object1                object                
        var1       var2       var1       var2 var3
0  34.405150  40.470001        NaN  34.000000  0.0
1  25.047388  40.369999  82.842171  33.700001  0.0
2  94.122835  40.277000        NaN  33.900002  0.0
3        NaN  40.188999  79.834408  34.000000  0.0
4  60.152592  40.102001  87.395262  34.000000  0.0

And if want assign output back assign to filtered DataFrame:

idx = pd.IndexSlice
z = example_df.loc[:, idx[:, ['var1', 'var2', 'var3']]]

example_df.loc[:, idx[:, ['var1', 'var2', 'var3']]] = z.mask((z < 0) | (z > 100))
print (example_df)
  meta_info                        object1                                \
       time counter measurement_id     grp      id       var1       var2   
0      2100     0.0              1       0  376690  34.405150  40.470001   
1      2200     1.0              1       0  376690  25.047388  40.369999   
2      2300     2.0              1       0  376690  94.122835  40.277000   
3      2400     3.0              1       0  376690        NaN  40.188999   
4       100     4.0              1       0  376690  60.152592  40.102001   

      object                  
        var1       var2 var3  
0        NaN  34.000000  0.0  
1  82.842171  33.700001  0.0  
2        NaN  33.900002  0.0  
3  79.834408  34.000000  0.0  
4  87.395262  34.000000  0.0  

Upvotes: 2

Related Questions