Cactus Philosopher
Cactus Philosopher

Reputation: 864

Pandas - select rows based on values within a list of columns

Related: Selecting with complex criteria from pandas.DataFrame

I have some DataFrame:

df = pd.DataFrame({'name': ['apple1', 'apple2', 'apple3', 'apple4', 'orange1', 'orange2', 'orange3', 'orange4'], 
                   'A': [0, 0, 0, 0, 0, 0 ,0, 0], 
                  'B': [0.10, -0.15, 0.25, -0.55, 0.50, -0.51, 0.70, 0], 
                  'C': [0, 0, 0.25, -0.55, 0.50, -0.51, 0.70, 0.90],
                  'D': [0.10, -0.15, 0.25, 0, 0.50, -0.51, 0.70, 0.90]})
df
name    A   B   C   D
0   apple1  0   0.10    0.00    0.10
1   apple2  0   -0.15   0.00    -0.15
2   apple3  0   0.25    0.25    0.25
3   apple4  0   -0.55   -0.55   0.00
4   orange1 0   0.50    0.50    0.50
5   orange2 0   -0.51   -0.51   -0.51
6   orange3 0   0.70    0.70    0.70
7   orange4 0   0.00    0.90    0.90

Now let's say I want to select all the rows with values less than 0.25 in A, B, C, and D:

df[(df['A'] < 0.25) & 
  (df['B'] < 0.25) &
  (df['C'] < 0.25) &
  (df['D'] < 0.25)]
    name    A   B   C   D
0   apple1  0   0.10    0.00    0.10
1   apple2  0   -0.15   0.00    -0.15
3   apple4  0   -0.55   -0.55   0.00
5   orange2 0   -0.51   -0.51   -0.51

Great, but can I achieve the same thing using a list of columns as input?

Imagine that I wanted to filter on 100 columns instead of 4.

Upvotes: 2

Views: 1628

Answers (2)

ansev
ansev

Reputation: 30920

Use DataFrame.all with axis=1 to performance a boolean indexing.

df[(df[['A','B','C','D']]<0.25).all(axis=1)]

Output

      name  A     B     C     D
0   apple1  0  0.10  0.00  0.10
1   apple2  0 -0.15  0.00 -0.15
3   apple4  0 -0.55 -0.55  0.00
5  orange2  0 -0.51 -0.51 -0.51

Another approach: pd.Index.difference

df[df[df.columns.difference(['name'])].lt(0.25).all(axis=1)]

Upvotes: 2

Erfan
Erfan

Reputation: 42916

In this case, because you have the same condition over multiple columns, you can use .all over axis=1 to check if the condition is True for all the selected columns:

df[df.loc[:, 'A':].lt(0.25).all(axis=1)]

      name  A     B     C     D
0   apple1  0  0.10  0.00  0.10
1   apple2  0 -0.15  0.00 -0.15
3   apple4  0 -0.55 -0.55  0.00
5  orange2  0 -0.51 -0.51 -0.51

If your columns arn't so well ordered, use double brackets for selection of your data:

df[df[['A', 'B', 'C', 'D']].lt(0.25).all(axis=1)]

Note: .lt is same as < and stands for "less than"


In case you have lot of columns with a single letter, which would be too much to select manually, we can use DataFrame.filter with regex:

df[df.filter(regex='[A-Z]').lt(0.25).all(axis=1)]

Upvotes: 4

Related Questions