Reputation: 835
How do I summarizes a column based on a dynamic value in the existing row? Using the example below, I'd like to iterate over each row (x
), calculate the sum of all Clicks
where Date == x.Date_Yesterday
, and add the total as a new column.
Input data:
df = pd.DataFrame({
'Date': ['2021-09-14','2021-09-14','2021-09-14','2021-09-13','2021-09-12','2021-09-12','2021-09-11'],
'Date_Yesterday': ['2021-09-13','2021-09-13','2021-09-13','2021-09-12','2021-09-11','2021-09-11','2021-09-10'],
'Clicks': [100,100,100,50,10,10,1]
})
Date Date_Yesterday Clicks
0 2021-09-14 2021-09-13 100
1 2021-09-14 2021-09-13 100
2 2021-09-14 2021-09-13 100
3 2021-09-13 2021-09-12 50
4 2021-09-12 2021-09-11 10
5 2021-09-12 2021-09-11 10
6 2021-09-11 2021-09-10 1
Desired Output data:
Date Date_Yesterday Clicks Total_Clicks_Yesterday
2021-09-14 2021-09-13 100 50
2021-09-14 2021-09-13 100 50
2021-09-14 2021-09-13 100 50
2021-09-13 2021-09-12 50 20
2021-09-12 2021-09-11 10 1
2021-09-12 2021-09-11 10 1
2021-09-11 2021-09-10 1 N/A
Calculating the Total_Clicks_Yesterday
is simple with a static value:
clicks_yesterday = df['Total_Clicks_Yesterday'] = df.loc[df['Date'] == '2021-09-13', 'Clicks'].sum()
print(clicks_yesterday)
Date Date_Yesterday Clicks Total_Clicks_Yesterday
0 2021-09-14 2021-09-13 100 50
1 2021-09-14 2021-09-13 100 50
2 2021-09-14 2021-09-13 100 50
3 2021-09-13 2021-09-12 50 50
4 2021-09-12 2021-09-11 10 50
5 2021-09-12 2021-09-11 10 50
6 2021-09-11 2021-09-10 1 50
but I'm not sure how to make it dynamic for each line item?
Upvotes: 2
Views: 334
Reputation: 3066
in one line:
pd.merge(df, df.groupby('Date')['Clicks'].sum().rename('Total_Clicks_Yesterday'), how='left', left_on='Date_Yesterday', right_index=True)
Here I'm merging two dataframes: the original dataframe, and a second one which is the groupby (documentation).
The groupby
is grouping over the current date, and summing it up:
df.groupby('Date')['Clicks'].sum().rename('Total_Clicks_Yesterday')
Out[16]:
Date
2021-09-11 1
2021-09-12 20
2021-09-13 50
2021-09-14 300
Name: Total_Clicks_Yesterday, dtype: int64
It's the easiest way to get a series as you desired, grouping and summing over a value. Although the name of the index is "date", I'm going to merge it as if it is the date of yesterday.
Now all we need to get this small series together with the original dataframe. I recommend taking a look at Pandas Merging 101 to understand it better.
Basically, I'm merging based on the left dataframe (df
), where I'm comparing the 'Date_Yesterday' column of df
with the index of the groupby df
(which to I refer to as "data of yesterday")
The result:
pd.merge(df, df.groupby('Date')['Clicks'].sum().rename('Total_Clicks_Yesterday'), how='left', left_on='Date_Yesterday', right_index=True)
Out[17]:
Date Date_Yesterday Clicks Total_Clicks_Yesterday
0 2021-09-14 2021-09-13 100 50.0
1 2021-09-14 2021-09-13 100 50.0
2 2021-09-14 2021-09-13 100 50.0
3 2021-09-13 2021-09-12 50 20.0
4 2021-09-12 2021-09-11 10 1.0
5 2021-09-12 2021-09-11 10 1.0
6 2021-09-11 2021-09-10 1 NaN
Upvotes: 2
Reputation: 29635
you can groupby
the column Date and sum
the Clicks to get the number of clicks each day. Then use map
on the column Date_yesterday with the result of the groupby
operation to align the number of clicks with the previous day
df['Total_Clicks_Yesterday'] = df['Date_Yesterday'].map(df.groupby('Date')['Clicks'].sum())
print(df)
Date Date_Yesterday Clicks Total_Clicks_Yesterday
0 2021-09-14 2021-09-13 100 50.0
1 2021-09-14 2021-09-13 100 50.0
2 2021-09-14 2021-09-13 100 50.0
3 2021-09-13 2021-09-12 50 20.0
4 2021-09-12 2021-09-11 10 1.0
5 2021-09-12 2021-09-11 10 1.0
6 2021-09-11 2021-09-10 1 NaN
Upvotes: 7