e1v1s
e1v1s

Reputation: 385

How to search all data frame rows for values outside a defined range of numbers?

So I have a data frame that's 50 columns and 400 rows consisting of all numbers. I'm trying to display only the columns that have values that fall outside a pre-defined range (i.e. only show values that aren't between -1 to +3).

So far I have:

df[(df.T > 3).all()]

to display values greater than 2 then I can change the integer to the other number of interest but how I can write something to display numbers that fall outside a range (i.e. display all columns that have values outside the range of -1 to +3).

Upvotes: 4

Views: 2232

Answers (2)

piRSquared
piRSquared

Reputation: 294288

you can use pd.DataFrame.mask

np.random.seed([3,1415])
df = pd.DataFrame(np.random.randint(-2, 4, (5, 3)), columns=list('abc'))
print(df)

   a  b  c
0 -2  1  0
1  1  0  0
2  3  1  3
3  0  1 -2
4  0 -2 -2

Mask makes cells that evaluate to True NaN

df.mask(df.ge(3) | df.le(-1))

     a    b    c
0  NaN  1.0  0.0
1  1.0  0.0  0.0
2  NaN  1.0  NaN
3  0.0  1.0  NaN
4  0.0  NaN  NaN

Or the opposite

df.mask(df.lt(3) & df.gt(-1))

     a    b    c
0 -2.0  NaN  NaN
1  NaN  NaN  NaN
2  3.0  NaN  3.0
3  NaN  NaN -2.0
4  NaN -2.0 -2.0

Upvotes: 4

EdChum
EdChum

Reputation: 394051

You could call stack to stack all columns so that you can use between to generate the mask on a range and then invert the mask using ~ and then call dropna(axis=1):

In [193]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df

Out[193]:
          a         b         c
0  0.088639  0.275458  0.837952
1  1.395237 -0.582110  0.614160
2 -1.114384 -2.774358  2.119473
3  1.050008 -1.195167 -0.343875
4 -0.006156 -2.028601 -0.071448

In [198]:
df[~df.stack().between(0.1,1).unstack()].dropna(axis=1)

Out[198]:
          a
0  0.088639
1  1.395237
2 -1.114384
3  1.050008
4 -0.006156

So here only column 'a' has values not between 0.1 and 1

prior to the dropna you can see that the other columns don't meet this criteria so they generate NaN:

In [199]:
df[~df.stack().between(0.1,1).unstack()]

Out[199]:
          a         b         c
0  0.088639       NaN       NaN
1  1.395237 -0.582110       NaN
2 -1.114384 -2.774358  2.119473
3  1.050008 -1.195167 -0.343875
4 -0.006156 -2.028601 -0.071448

By default the left and right values are included, if this isn't required then pass inclusive=False to between

Upvotes: 1

Related Questions