Reputation: 30605
Lets suppose I create a dataframe with columns and query i.e
pd.DataFrame([[1,2],[3,4],[5,6]],columns=['a','b']).query('a>1')
This will give me
a b
1 3 4
2 5 6
But when dataframe values are too large and I don't have column names, how can I query a column by its index?
I tried querying by passing a number, but it's not the way of doing it.
pd.DataFrame([[1,2],[3,4],[5,6]]).query('0>1') # This is what I tried.
How to denote 0
is the column name in query?
Expected Output:
0 1
1 3 4
2 5 6
Upvotes: 9
Views: 7070
Reputation: 798
You can create an intermediate column with assign + a lambda function:
pd.DataFrame([[1, 2], [3, 4], [5, 6]]).assign(col=lambda x: x[0]).query("col>1")
Upvotes: 0
Reputation: 7353
An option without any monkey patching is to use @
to define a variable and do this as follows.
# If you are fond of one-liners
df = pd.DataFrame([[1,2],[3,4],[5,6]]); df.query('@df[0] > 1')
# Otherwise this is the same as
df = pd.DataFrame([[1,2],[3,4],[5,6]])
df.query('@df[0] > 1') # @df refers to the variable df
Output:
0 1
1 3 4
2 5 6
You can find more ways of dealing with this here.
Upvotes: 3
Reputation: 30605
Since the query is under development one possible solution is creating a monkey patch for pd.DataFrame
to evaluate self i.e :
def query_cols(self,expr):
if 'self' in expr:
return self[eval(expr)]
else:
return self.query(expr)
pd.DataFrame.query_cols = query_cols
pd.DataFrame([[1,2],[3,4],[5,6]]).query_cols('self[1] > 3')
0 1
1 3 4
2 5 6
pd.DataFrame([[1,2],[3,4],[5,6]]).query_cols('self[1] == 4')
0 1
1 3 4
pd.DataFrame([[1,2],[3,4],[5,6]],columns=['a','b']).query_cols('a > 3')
a b
2 5 6
This is a simple trick and doesn't suit all the cases, answer will be updated when the issue with query is resolved.
Upvotes: 7