user8022517
user8022517

Reputation:

Pandas DataFrame adding column after groupby

I have the following csv file:

type    sku quantity    country account
Order   CHG-FOOD1COMP-CA    1   usa hch
Order   CHG-FOOD2COMP-CA    1   usa hch
Order   CHG-FOOD2COMP-CA    1   usa hch
Order   CHG-FOOD1COMP-CA    1   usa hch
Order   CHG-FOODCONT1-CA    1   usa hch
Order           usa hch
Order   Q7-QDH0-EBB5-CA 1   usa hch
Order   CHG-FRY-12PT5-CA    1   usa hch
Order   Q7-QDH0-EBB5-CA 1   usa hch
Order   Q7-QDH0-EBB5-CA 1   usa hch
Order   CHG-FRY-12PT5-CA    1   usa hch
Order   CB-BB-CLR12-CA  1   usa hch
Order   CB-BB-AMB12-CA  1   usa hch

Order           usa hch
Order   CB-BB-AMB12-CA  1   usa hch
Order   CHG-FRY-12PT5-CA    1   usa hch
Order   CB-BB-CLR12-CA  1   usa hch
Order   CHG-FRY-12PT5-CA    1   usa hch
Order   CHG-FOODCONT1-CA    1   usa hch
Refund  CHG-FRY-9PT5-CA 1   usa hch
Order   CHG-FOOD1COMP-CA    1   usa hch

I have the following data. I want to get the the total quantity per sku.

SQL: Select sku sum(quantity) As TotalQty,  country, account
     From (usa_chc_Date.csv)
     group by sku,...

I dont mind getting the sum first and then adding those country/account columns that are always the same. My purpose is to store the info in these csv so they are easy to load into django and then delete the files. This is what I am looking for:

sku   TotalQty country account
sku1   7       mx     chc
sku3   4       mx     chc
sku4   2       mx     chc
sku5   1       mx     chc
sku6   7       mx     chc
sku7   9       mx     chc

I also named the file to include the country/account info. I guess I could just use the file, and strip the country and account as I save the model.

side note-the accounts do not change since they are on the same report. Once they are loaded they skus can have duplicates but they have different countries.

I tried this:

 df = df.groupby(['sku','quantity']).sum()

Upvotes: 1

Views: 6697

Answers (2)

user8022517
user8022517

Reputation:

df = df.groupby(['sku','Country','Account'],as_index=True)['actual sales'].sum()
df = df.reset_index()
df.rename(columns={0:'count'}, inplace=True)

I changed a column name for my convenience, otherwise irrelevant

Upvotes: 0

FabienP
FabienP

Reputation: 3158

You're using pd.groupby on the wrong colums.

Your question suggests that "country" and "account" are the same for all "sku". In this case you should use:

df.groupby(['sku', 'country', 'account'], as_index=False).quantity.sum()
Out []:
                sku country account  quantity
0    CB-BB-AMB12-CA     usa     hch         2
1    CB-BB-CLR12-CA     usa     hch         2
2  CHG-FOOD1COMP-CA     usa     hch         3
3  CHG-FOOD2COMP-CA     usa     hch         2
4  CHG-FOODCONT1-CA     usa     hch         2
5  CHG-FRY-12PT5-CA     usa     hch         4
6   CHG-FRY-9PT5-CA     usa     hch         1
7   Q7-QDH0-EBB5-CA     usa     hch         3

Note: I removed two lines from your example where there is no "sku" nor "quantity". It these cases should be handled, just tell is in comment.

Upvotes: 1

Related Questions