Reputation: 79
I'm sure what I'm asking is simple question but have yet to figure it out. I have a panda df and I want to run this basic query on it
Select a,b,c
FROM TABLE
WHERE (TABLE.time >= x ) && (TABLE.time <= y)
GROUP BY c
so if I have a table
A B time
a b time1
c d time2
e f time3
I would want only to return the a,b,c where the time is greater or less than the ones in the query. Also would this query on a Dataframe give me another df if I assign the query to a variable say something like
df2 = df.query()
I hope this makes sense
Upvotes: 1
Views: 1877
Reputation: 198
As mentioned in Documents docs
The query() method uses a slightly modified Python syntax by default. It is used to apply condition like greater then less then. query method does not support group by itself instead data frame have method groupby which works the same way.
I attempted to write code for your query take a look at it :
g= table.query('time>=x and time<y').groupby('C')
for name,group in g:
print(name , group[['a','b','c']])
Without using query() :
g = table[(table.time>=x) & (table.time <= y)].groupby('C')
for name,group in g:
print(name , group[['a','b','c']])
Upvotes: 1
Reputation: 79
So i'm sure this isn't the best work-around but it worked for me.
df = pd.read_excel("file.xlsx", index_col= None, na_values=['NA'] , usecols=[18,4,5,21,0,1])
df2 = df[(df.TIME >= x) , (df.TIME <= y)]
df3 = df2[['a','b','c']]
That help me get the a,b,c within the time range I put
Upvotes: 0