Night Walker
Night Walker

Reputation: 21260

Get by condition column values

I have following DataFrame:

   A      B
0  1      5
1  2      3
2  3      2
3  4      0
4  5      1

How I can get by condition values of column A ?

For example all values that great then 3 and less then 6.

Upvotes: 2

Views: 86

Answers (2)

Stefan
Stefan

Reputation: 42875

You can use boolean indexing, either with conditions for the endpoints of your interval

df[(df.A > 3) & (df.A < 6)]

or the convenience method .between(), which behind the scenes translates to the above (and hence is a very very tiny bit slower) where you need to take care that limits are inclusive by default:

df[df.A.between(4, 5)] # uses inclusive limits

to get:

   A  B
3  4  0
4  5  1

Upvotes: 0

jezrael
jezrael

Reputation: 862641

Use between (is possible use parameter inclusive=False) with boolean indexing:

print (df[df.A.between(4,5)])

Sample:

df = pd.DataFrame({'A': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5,5: 6}, 
                   'B': {0: 5, 1: 3, 2: 2, 3: 0, 4: 2, 5: 1}})
print (df)
   A  B
0  1  5
1  2  3
2  3  2
3  4  0
4  5  2
5  6  1

print (df[df.A.between(4,5)]) #default inclusive=True
   A  B
3  4  0
4  5  2

print (df[df.A.between(3,6, inclusive=False)])
   A  B
3  4  0
4  5  2

Timings are same:

df = pd.concat([df]*10000).reset_index(drop=True)

In [427]: %timeit (df[df.A.between(3,6, inclusive=False)])
The slowest run took 4.72 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.32 ms per loop

In [428]: %timeit (df[(df.A>3) & (df.A<6)])
1000 loops, best of 3: 1.31 ms per loop

Upvotes: 0

Related Questions