Reputation: 2085
With the data frame constructed as below, but with many more objects (structured as object1
columns) I want to remove (replace with np.nan
) all values of var1
, var2
, var3
when there are smaller than 0 or greater than 100.
example_df = {('meta_info', 'time'): {0: 2100, 1: 2200, 2: 2300, 3: 2400, 4: 100},
('meta_info', 'counter'): {0: 0.0, 1: 1.0, 2: 2.0, 3: 3.0, 4: 4.0},
('meta_info', 'measurement_id'): {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
('object1', 'grp'): {0: '0', 1: '0', 2: '0', 3: '0', 4: '0'},
('object1', 'id'): {0: '376690',
1: '376690',
2: '376690',
3: '376690',
4: '376690'},
('object1', 'var1'): {0: 34.405149821218195,
1: 25.047388024508773,
2: 94.12283547956514,
3: -38.34383022173205,
4: 60.15259222044418},
('object1', 'var2'): {0: 40.470001220703125,
1: 40.369998931884766,
2: 40.277000427246094,
3: 40.18899917602539,
4: 40.10200119018555},
('object', 'var1'): {0: -4.453429468309658,
1: 82.84217089703611,
2: 145.2084949734712,
3: 79.83440766416545,
4: 87.39526160763526},
('object', 'var2'): {0: 34.0,
1: 33.70000076293945,
2: 33.900001525878906,
3: 34.0,
4: 34.0},
('object', 'var3'): {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}}
example_df = pd.DataFrame(example_df)
I was trying to use pd.DataFrame.mask
here, but following code fails:
idx = pd.IndexSlice
z = example_df.loc[:, idx[:, ['var1', 'var2', 'var3']]]
example_df.mask((z < 0) | (z > 100))
as it removes all values of other columns than those within z
.
This is confusing as documentation says:
cond : boolean Series/DataFrame, array-like, or callable Where
cond
is False, keep the original value. Where True, replace with corresponding value fromother
.
so I assume that first 5 columns, which are not part of z
view are not checked - otherwise only (object1, id)
would be replaced with NaN
as it is the only columns with values outside the range. Why those columns are filled with NaN
then? Is there any other method that would allow me to check selected part of the DataFrame
at once?
Upvotes: 0
Views: 184
Reputation: 862431
I think you need replace filtered DataFrame z
, because boolean mask is from filtered DataFrame
:
print (z.mask((z < 0) | (z > 100)))
object1 object
var1 var2 var1 var2 var3
0 34.405150 40.470001 NaN 34.000000 0.0
1 25.047388 40.369999 82.842171 33.700001 0.0
2 94.122835 40.277000 NaN 33.900002 0.0
3 NaN 40.188999 79.834408 34.000000 0.0
4 60.152592 40.102001 87.395262 34.000000 0.0
And if want assign output back assign to filtered DataFrame:
idx = pd.IndexSlice
z = example_df.loc[:, idx[:, ['var1', 'var2', 'var3']]]
example_df.loc[:, idx[:, ['var1', 'var2', 'var3']]] = z.mask((z < 0) | (z > 100))
print (example_df)
meta_info object1 \
time counter measurement_id grp id var1 var2
0 2100 0.0 1 0 376690 34.405150 40.470001
1 2200 1.0 1 0 376690 25.047388 40.369999
2 2300 2.0 1 0 376690 94.122835 40.277000
3 2400 3.0 1 0 376690 NaN 40.188999
4 100 4.0 1 0 376690 60.152592 40.102001
object
var1 var2 var3
0 NaN 34.000000 0.0
1 82.842171 33.700001 0.0
2 NaN 33.900002 0.0
3 79.834408 34.000000 0.0
4 87.395262 34.000000 0.0
Upvotes: 2