Bryce Ramgovind
Bryce Ramgovind

Reputation: 3257

DataFrame Groupby while maintaining original DataFrame

I have a DataFrame that has 9 columns which are encoded values for Day of the week(1-7), Week of the Year(1-52), Month of the Year (1-12), Time bin (every 3 hours), Salary Day(0,1) and Holiday(0,1) and Amount(real number). The time is placed in a time bin e.g. 15:00 is placed in 6th time bin and 7:34 is placed in the 3rd time bin.

Day Week    Month   Time    Salary  Holiday Amount
1   5       2      1        0       0        700.0
1   5       2      1        0       0        800.0
3   18      5      7        1       0        90.0

Basically, I need to group the data by the first 6 columns and create a new column which is the summed value of Amount. However, this amount is will be repeated since I need to maintain the original dataframe. So, it must only sum purchases that have the same Day, Week, Month, Time, Salary and Holiday i.e. sum in that group.

Day Week    Month   Time    Salary  Holiday Amount  Sum
1   5       2      1        0       0        700.0   1500.0
1   5       2      1        0       0        800.0   1500.0
3   18      5      7        1       0        90.0    90.0

I have grouped the data frame by Day, Week, Month, Time, Salary, Holiday and summed Amount.

temp= features.groupby(by=["Day", "Week", "Month", "Time", "Salary", "Holiday"])["Amount"].sum()

Upvotes: 2

Views: 2802

Answers (2)

akuiper
akuiper

Reputation: 214927

You can use transform to return a column of the same size of the original data frame, from the docs:

The transform method returns an object that is indexed the same (same size) as the one being grouped. Thus, the passed transform function should return a result that is the same size as the group chunk.

df['Sum'] = df.groupby(["Day", "Week", "Month", "Time", "Salary", "Holiday"]).transform('sum')

df    
# Day Week Month Time Salary Holiday Amount  Sum
#0  1   5      2    1      0       0    700 1500
#1  1   5      2    1      0       0    800 1500
#2  3  18      5    7      1       0     90   90

Upvotes: 2

Greg Friedman
Greg Friedman

Reputation: 341

You can reset the index on temp and then do an outer merge with the original feature dataframe on all the columns you grouped by.

result = features.merge(temp.reset_index(), on=["Day", "Week", "Month", "Time", "Salary", "Holiday"])

Upvotes: 0

Related Questions