Reputation: 197
I have this HUGE csv file that columns named timedim, unblendedcost and many more. I am loading this in pandas and trying to do something which is equivalent to this sql statement,
SELECT sum(unblendedcost), timedim from usages group by day(timedim) where usages is my table in database
I did try to load the CSV in a database but its like 6 million rows. Any help will be much appreciated
Upvotes: 1
Views: 197
Reputation: 862581
It looks like need:
usages.groupby('timedim', as_index=False)['unblendedcost'].sum()
If dtype
of column timedim
is datetime with time information, use:
usages.unblendedcost.groupby(df.timedim.dt.date, as_index=False).sum()
Sample:
import pandas as pd
usages = pd.DataFrame({'timedim':[1,1,3,3],
'unblendedcost':[1,2,3,4],
'a':[7,8,9,8]})
print (usages)
a timedim unblendedcost
0 7 1 1
1 8 1 2
2 9 3 3
3 8 3 4
print (usages.groupby('timedim', as_index=False)['unblendedcost'].sum() )
timedim unblendedcost
0 1 3
1 3 7
Upvotes: 1