Reputation: 1
I'm trying to groupby my dataframe into a tree format so that it sorts down in a hierarchical way. DC being the first column that funnels down into Retailer, Store Count, Product descriptions, case volume and velocity in that order. Summing the retailer column into a new column "StoreCt" that is positioned after "Retailer"
The problem I'm running into is the store counts are being duplicated.
Here is the dataframe I have
Retailer | DC | Product | Cs | Volume | Velocity |
---|---|---|---|---|---|
joe | ABC | bars | Cs | Cost | Velocity |
joe | DFC | drinks1 | Cs | Cost | Velocity |
joe | DFC | drinks2 | Cs | Cost | Velocity |
randy | ABC | bars | Cs | Cost | Velocity |
peter | DFC | drinks2 | Cs | Cost | Velocity |
john | XYZ | drinks | Cs | Cost | Velocity |
joe | XYZ | snacks | Cs | Cost | Velocity |
joe | DFC | bars2 | Cs | Cost | Velocity |
This is the result I want. values in the cs, volume, and velocity columns need to be unchanged
DC | Retailer | StoreCt | Product | Cs | Volume | Velocity |
---|---|---|---|---|---|---|
ABC | joe | 1 | bars | Cs | Cost | Velocity |
randy | 1 | bars | Cs | Cost | Velocity | |
DFC | joe | 3 | drinks1 | Cs | Cost | Velocity |
drinks2 | Cs | Cost | Velocity | |||
bars2 | Cs | Cost | Velocity | |||
peter | 1 | drinks2 | Cs | Cost | Velocity | |
XYZ | joe | 1 | snacks | Cs | Cost | Velocity |
john | 1 | drinks | Cs | Cost | Velocity |
this is my code to get the store count, but i can't figure out how to add it into the dataframe without duplicating the values
store_count = df.groupby("Retailer").size().to_frame("StoreCt")
store_count
Upvotes: 0
Views: 43
Reputation: 120419
Use transform
to broadcast the result to all rows:
df['StoreCt'] = df.groupby(['DC', 'Retailer']).transform('size')
print(df)
# Output:
Retailer DC Product Cs Volume Velocity StoreCt
0 joe ABC bars Cs Cost Velocity 1
1 joe DFC drinks1 Cs Cost Velocity 3
2 joe DFC drinks2 Cs Cost Velocity 3
3 randy ABC bars Cs Cost Velocity 1
4 peter DFC drinks2 Cs Cost Velocity 1
5 john XYZ drinks Cs Cost Velocity 1
6 joe XYZ snacks Cs Cost Velocity 1
7 joe DFC bars2 Cs Cost Velocity 3
To get the output, you can reorder the columns:
cols = ['DC', 'Retailer', 'StoreCt', 'Product', 'Cs', 'Volume', 'Velocity']
df = df[cols].sort_values(['DC', 'Retailer'], ignore_index=True)
print(df)
# Output
DC Retailer StoreCt Product Cs Volume Velocity
0 ABC joe 1 bars Cs Cost Velocity
1 ABC randy 1 bars Cs Cost Velocity
2 DFC joe 3 drinks1 Cs Cost Velocity
3 DFC joe 3 drinks2 Cs Cost Velocity
4 DFC joe 3 bars2 Cs Cost Velocity
5 DFC peter 1 drinks2 Cs Cost Velocity
6 XYZ joe 1 snacks Cs Cost Velocity
7 XYZ john 1 drinks Cs Cost Velocity
Upvotes: 1