Reputation: 385
So I have a data frame that's 50 columns and 400 rows consisting of all numbers. I'm trying to display only the columns that have values that fall outside a pre-defined range (i.e. only show values that aren't between -1 to +3).
So far I have:
df[(df.T > 3).all()]
to display values greater than 2 then I can change the integer to the other number of interest but how I can write something to display numbers that fall outside a range (i.e. display all columns that have values outside the range of -1 to +3).
Upvotes: 4
Views: 2232
Reputation: 294288
you can use pd.DataFrame.mask
np.random.seed([3,1415])
df = pd.DataFrame(np.random.randint(-2, 4, (5, 3)), columns=list('abc'))
print(df)
a b c
0 -2 1 0
1 1 0 0
2 3 1 3
3 0 1 -2
4 0 -2 -2
Mask makes cells that evaluate to True
NaN
df.mask(df.ge(3) | df.le(-1))
a b c
0 NaN 1.0 0.0
1 1.0 0.0 0.0
2 NaN 1.0 NaN
3 0.0 1.0 NaN
4 0.0 NaN NaN
Or the opposite
df.mask(df.lt(3) & df.gt(-1))
a b c
0 -2.0 NaN NaN
1 NaN NaN NaN
2 3.0 NaN 3.0
3 NaN NaN -2.0
4 NaN -2.0 -2.0
Upvotes: 4
Reputation: 394051
You could call stack
to stack all columns so that you can use between
to generate the mask on a range and then invert the mask using ~
and then call dropna(axis=1)
:
In [193]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df
Out[193]:
a b c
0 0.088639 0.275458 0.837952
1 1.395237 -0.582110 0.614160
2 -1.114384 -2.774358 2.119473
3 1.050008 -1.195167 -0.343875
4 -0.006156 -2.028601 -0.071448
In [198]:
df[~df.stack().between(0.1,1).unstack()].dropna(axis=1)
Out[198]:
a
0 0.088639
1 1.395237
2 -1.114384
3 1.050008
4 -0.006156
So here only column 'a' has values not between 0.1 and 1
prior to the dropna
you can see that the other columns don't meet this criteria so they generate NaN
:
In [199]:
df[~df.stack().between(0.1,1).unstack()]
Out[199]:
a b c
0 0.088639 NaN NaN
1 1.395237 -0.582110 NaN
2 -1.114384 -2.774358 2.119473
3 1.050008 -1.195167 -0.343875
4 -0.006156 -2.028601 -0.071448
By default the left and right values are included, if this isn't required then pass inclusive=False
to between
Upvotes: 1