Hassam Sheikh
Hassam Sheikh

Reputation: 197

Pandas alternative of the SQL statement

I have this HUGE csv file that columns named timedim, unblendedcost and many more. I am loading this in pandas and trying to do something which is equivalent to this sql statement,

SELECT sum(unblendedcost), timedim from usages group by day(timedim) where usages is my table in database

I did try to load the CSV in a database but its like 6 million rows. Any help will be much appreciated

Upvotes: 1

Views: 197

Answers (1)

jezrael
jezrael

Reputation: 862581

It looks like need:

usages.groupby('timedim', as_index=False)['unblendedcost'].sum() 

If dtype of column timedim is datetime with time information, use:

usages.unblendedcost.groupby(df.timedim.dt.date, as_index=False).sum()

Sample:

import pandas as pd

usages = pd.DataFrame({'timedim':[1,1,3,3],
                       'unblendedcost':[1,2,3,4],
                       'a':[7,8,9,8]})

print (usages)
   a  timedim  unblendedcost
0  7        1              1
1  8        1              2
2  9        3              3
3  8        3              4

print (usages.groupby('timedim', as_index=False)['unblendedcost'].sum() )
   timedim  unblendedcost
0        1              3
1        3              7

Upvotes: 1

Related Questions