Pandas alternative of the SQL statement

Question

I have this HUGE csv file that columns named timedim, unblendedcost and many more. I am loading this in pandas and trying to do something which is equivalent to this sql statement,

SELECT sum(unblendedcost), timedim from usages group by day(timedim) where usages is my table in database

I did try to load the CSV in a database but its like 6 million rows. Any help will be much appreciated

jezrael · Accepted Answer

It looks like need:

usages.groupby('timedim', as_index=False)['unblendedcost'].sum()

If dtype of column timedim is datetime with time information, use:

usages.unblendedcost.groupby(df.timedim.dt.date, as_index=False).sum()

Sample:

import pandas as pd

usages = pd.DataFrame({'timedim':[1,1,3,3],
                       'unblendedcost':[1,2,3,4],
                       'a':[7,8,9,8]})

print (usages)
   a  timedim  unblendedcost
0  7        1              1
1  8        1              2
2  9        3              3
3  8        3              4

print (usages.groupby('timedim', as_index=False)['unblendedcost'].sum() )
   timedim  unblendedcost
0        1              3
1        3              7

Pandas alternative of the SQL statement

Answers (1)

Related Questions