Laurens Koppenol
Laurens Koppenol

Reputation: 3106

Find cells in dataframe where value is between x and y

I want all values in a pandas dataframe as True / False depending on whether the value is between the given x and y.

Any combining of 2 dataframes using an 'AND' operator, or any 'between' functionality from pandas would be nice. I would prefer not to loop over the columns and call the pandas.Series.between(x, y) function.

Example

Given the following dataframe

>>> df = pd.DataFrame([{1:1,2:2,3:6},{1:9,2:9,3:10}])
>>> df
   1  2   3
0  1  2   6
1  9  9  10

I want all values between x and y. I can for example start with:

>>> df > 2
       1      2     3
0  False  False  True
1   True   True  True

and then do

>>> df < 10
      1     2      3
0  True  True   True
1  True  True  False

But then

>>> df > 2 and df < 10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Laurens Koppenol\Anaconda2\lib\site-packages\pandas\core\generic.py", line 731, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 3

Views: 1034

Answers (2)

piRSquared
piRSquared

Reputation: 294488

between is a convenient method for this. However, it is only for series objects. we can get around this by either using apply which operates on each row (or column) which is a series. Or, reshape the dataframe to a series with stack

use stack, between, unstack

df.stack().between(2, 10, inclusive=False).unstack()

enter image description here

Upvotes: 0

EdChum
EdChum

Reputation: 394189

use & with parentheses (due to operator precedence), and doesn't understand how to treat an array of booleans hence the warning:

In [64]:
df = pd.DataFrame([{1:1,2:2,3:6},{1:9,2:9,3:10}])
(df > 2) & (df < 10)

Out[64]:
       1      2      3
0  False  False   True
1   True   True  False

It's possible to use between with apply but this will be slower for a large df:

In [66]:
df.apply(lambda x: x.between(2,10, inclusive=False))

Out[66]:
       1      2      3
0  False  False   True
1   True   True  False

Note that this warning will get raised whenever you try to compare a df or series using and, or, and not, you should use &, |, and ~ respectively as these bitwise operators understand how to treat arrays correctly

Upvotes: 4

Related Questions