Reputation:
I have the following csv file:
type sku quantity country account
Order CHG-FOOD1COMP-CA 1 usa hch
Order CHG-FOOD2COMP-CA 1 usa hch
Order CHG-FOOD2COMP-CA 1 usa hch
Order CHG-FOOD1COMP-CA 1 usa hch
Order CHG-FOODCONT1-CA 1 usa hch
Order usa hch
Order Q7-QDH0-EBB5-CA 1 usa hch
Order CHG-FRY-12PT5-CA 1 usa hch
Order Q7-QDH0-EBB5-CA 1 usa hch
Order Q7-QDH0-EBB5-CA 1 usa hch
Order CHG-FRY-12PT5-CA 1 usa hch
Order CB-BB-CLR12-CA 1 usa hch
Order CB-BB-AMB12-CA 1 usa hch
Order usa hch
Order CB-BB-AMB12-CA 1 usa hch
Order CHG-FRY-12PT5-CA 1 usa hch
Order CB-BB-CLR12-CA 1 usa hch
Order CHG-FRY-12PT5-CA 1 usa hch
Order CHG-FOODCONT1-CA 1 usa hch
Refund CHG-FRY-9PT5-CA 1 usa hch
Order CHG-FOOD1COMP-CA 1 usa hch
I have the following data. I want to get the the total quantity per sku.
SQL: Select sku sum(quantity) As TotalQty, country, account
From (usa_chc_Date.csv)
group by sku,...
I dont mind getting the sum first and then adding those country/account columns that are always the same. My purpose is to store the info in these csv so they are easy to load into django and then delete the files. This is what I am looking for:
sku TotalQty country account
sku1 7 mx chc
sku3 4 mx chc
sku4 2 mx chc
sku5 1 mx chc
sku6 7 mx chc
sku7 9 mx chc
I also named the file to include the country/account info. I guess I could just use the file, and strip the country and account as I save the model.
side note-the accounts do not change since they are on the same report. Once they are loaded they skus can have duplicates but they have different countries.
I tried this:
df = df.groupby(['sku','quantity']).sum()
Upvotes: 1
Views: 6697
Reputation:
df = df.groupby(['sku','Country','Account'],as_index=True)['actual sales'].sum()
df = df.reset_index()
df.rename(columns={0:'count'}, inplace=True)
I changed a column name for my convenience, otherwise irrelevant
Upvotes: 0
Reputation: 3158
You're using pd.groupby
on the wrong colums.
Your question suggests that "country" and "account" are the same for all "sku". In this case you should use:
df.groupby(['sku', 'country', 'account'], as_index=False).quantity.sum()
Out []:
sku country account quantity
0 CB-BB-AMB12-CA usa hch 2
1 CB-BB-CLR12-CA usa hch 2
2 CHG-FOOD1COMP-CA usa hch 3
3 CHG-FOOD2COMP-CA usa hch 2
4 CHG-FOODCONT1-CA usa hch 2
5 CHG-FRY-12PT5-CA usa hch 4
6 CHG-FRY-9PT5-CA usa hch 1
7 Q7-QDH0-EBB5-CA usa hch 3
Note: I removed two lines from your example where there is no "sku" nor "quantity". It these cases should be handled, just tell is in comment.
Upvotes: 1