Reputation: 3257
I have a DataFrame that has 9 columns which are encoded values for Day of the week(1-7), Week of the Year(1-52), Month of the Year (1-12), Time bin (every 3 hours), Salary Day(0,1) and Holiday(0,1) and Amount(real number). The time is placed in a time bin e.g. 15:00 is placed in 6th time bin and 7:34 is placed in the 3rd time bin.
Day Week Month Time Salary Holiday Amount
1 5 2 1 0 0 700.0
1 5 2 1 0 0 800.0
3 18 5 7 1 0 90.0
Basically, I need to group the data by the first 6 columns and create a new column which is the summed value of Amount. However, this amount is will be repeated since I need to maintain the original dataframe. So, it must only sum purchases that have the same Day, Week, Month, Time, Salary and Holiday i.e. sum in that group.
Day Week Month Time Salary Holiday Amount Sum
1 5 2 1 0 0 700.0 1500.0
1 5 2 1 0 0 800.0 1500.0
3 18 5 7 1 0 90.0 90.0
I have grouped the data frame by Day, Week, Month, Time, Salary, Holiday and summed Amount.
temp= features.groupby(by=["Day", "Week", "Month", "Time", "Salary", "Holiday"])["Amount"].sum()
Upvotes: 2
Views: 2802
Reputation: 214927
You can use transform
to return a column of the same size of the original data frame, from the docs:
The transform method returns an object that is indexed the same (same size) as the one being grouped. Thus, the passed transform function should return a result that is the same size as the group chunk.
df['Sum'] = df.groupby(["Day", "Week", "Month", "Time", "Salary", "Holiday"]).transform('sum')
df
# Day Week Month Time Salary Holiday Amount Sum
#0 1 5 2 1 0 0 700 1500
#1 1 5 2 1 0 0 800 1500
#2 3 18 5 7 1 0 90 90
Upvotes: 2
Reputation: 341
You can reset the index on temp and then do an outer merge with the original feature dataframe on all the columns you grouped by.
result = features.merge(temp.reset_index(), on=["Day", "Week", "Month", "Time", "Salary", "Holiday"])
Upvotes: 0