Jeff Pernia
Jeff Pernia

Reputation: 79

select rows from a query in DataFrame in Pandas

I'm sure what I'm asking is simple question but have yet to figure it out. I have a panda df and I want to run this basic query on it

Select a,b,c 
FROM TABLE
WHERE (TABLE.time >= x ) && (TABLE.time <= y)
GROUP BY c

so if I have a table

A    B    time

a    b    time1
c    d    time2
e    f    time3

I would want only to return the a,b,c where the time is greater or less than the ones in the query. Also would this query on a Dataframe give me another df if I assign the query to a variable say something like

df2 = df.query()

I hope this makes sense

Upvotes: 1

Views: 1877

Answers (2)

Krishna
Krishna

Reputation: 198

As mentioned in Documents docs

The query() method uses a slightly modified Python syntax by default. It is used to apply condition like greater then less then. query method does not support group by itself instead data frame have method groupby which works the same way.

I attempted to write code for your query take a look at it :

g= table.query('time>=x and time<y').groupby('C')

for name,group in g:
    print(name , group[['a','b','c']])

Without using query() :

g = table[(table.time>=x) & (table.time <= y)].groupby('C')

for name,group in g:
        print(name , group[['a','b','c']])

Upvotes: 1

Jeff Pernia
Jeff Pernia

Reputation: 79

So i'm sure this isn't the best work-around but it worked for me.

df = pd.read_excel("file.xlsx", index_col= None, na_values=['NA'] , usecols=[18,4,5,21,0,1])
df2 = df[(df.TIME >= x) , (df.TIME <= y)]
df3 = df2[['a','b','c']]

That help me get the a,b,c within the time range I put

Upvotes: 0

Related Questions