Reputation: 38619
I just got tripped on this: consider this example:
>>> import pandas as pd
>>>
df = pd.DataFrame({
"key":[1,3,6,10,15,21],
"columnA":[10,20,30,40,50,60],
"columnB":[100,200,300,400,500,600],
"columnC":[110,202,330,404,550,606],
})
>>> df
key columnA columnB columnC
0 1 10 100 110
1 3 20 200 202
2 6 30 300 330
3 10 40 400 404
4 15 50 500 550
5 21 60 600 606
So, I want to extract data from this table, where key
column (assume it grows monotonically) lies between two values (say 2 and 15) - but only for some specific columns (say, "columnA" and "columnC". Hopefully, this can be done in a one-liner.
Now, if I want to use the .between()
method - it basically returns true/false for all rows:
>>> df['key'].between(2, 16)
0 False
1 True
2 True
3 True
4 True
5 False
So, to actually extract those rows, I need to put the above command in square brackets:
>>> df[df['key'].between(2, 16)]
key columnA columnB columnC
1 3 20 200 202
2 6 30 300 330
3 10 40 400 404
4 15 50 500 550
Excellent, this is what I need - I just need to limit the columns; so I try this:
>>> df[df['key'].between(2, 16), ["columnA"]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:/msys64/mingw64/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "C:/msys64/mingw64/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 116, in pandas._libs.index.IndexEngine.get_loc
TypeError: '(0 False
1 True
2 True
3 True
4 True
5 False
Name: key, dtype: bool, ['columnA'])' is an invalid key
Uh... no dice.
So, how can I do the above, and limit specific columns?
Upvotes: 2
Views: 3408
Reputation: 1758
You can just use the standard way of slicing DataFrames:
df[df['key'].between(2,16)][['key','columnA','columnC']]
Upvotes: 3
Reputation: 38619
Well, turns out, I need to use .loc
:
>>> df.loc[df['key'].between(2, 16), ["columnA"]]
columnA
1 20
2 30
3 40
4 50
... or rather, as I originally wanted it (and also adding the "key" column):
>>> df.loc[df['key'].between(2, 16), ["key", "columnA", "columnC"]]
key columnA columnC
1 3 20 202
2 6 30 330
3 10 40 404
4 15 50 550
Upvotes: 0