David Beaudway
David Beaudway

Reputation: 835

Sum dataframe column by conditional row criteria

How do I summarizes a column based on a dynamic value in the existing row? Using the example below, I'd like to iterate over each row (x), calculate the sum of all Clicks where Date == x.Date_Yesterday, and add the total as a new column.

Input data:

df = pd.DataFrame({
    'Date': ['2021-09-14','2021-09-14','2021-09-14','2021-09-13','2021-09-12','2021-09-12','2021-09-11'],
    'Date_Yesterday': ['2021-09-13','2021-09-13','2021-09-13','2021-09-12','2021-09-11','2021-09-11','2021-09-10'],
    'Clicks': [100,100,100,50,10,10,1]
})


   Date           Date_Yesterday  Clicks
0  2021-09-14     2021-09-13      100
1  2021-09-14     2021-09-13      100
2  2021-09-14     2021-09-13      100
3  2021-09-13     2021-09-12      50
4  2021-09-12     2021-09-11      10
5  2021-09-12     2021-09-11      10
6  2021-09-11     2021-09-10      1

Desired Output data:

Date          Date_Yesterday   Clicks  Total_Clicks_Yesterday
2021-09-14    2021-09-13       100     50
2021-09-14    2021-09-13       100     50
2021-09-14    2021-09-13       100     50
2021-09-13    2021-09-12       50      20
2021-09-12    2021-09-11       10      1
2021-09-12    2021-09-11       10      1
2021-09-11    2021-09-10       1       N/A

Calculating the Total_Clicks_Yesterday is simple with a static value:

clicks_yesterday = df['Total_Clicks_Yesterday'] = df.loc[df['Date'] == '2021-09-13', 'Clicks'].sum()
print(clicks_yesterday)

         Date Date_Yesterday  Clicks  Total_Clicks_Yesterday
0  2021-09-14     2021-09-13     100                      50
1  2021-09-14     2021-09-13     100                      50
2  2021-09-14     2021-09-13     100                      50
3  2021-09-13     2021-09-12      50                      50
4  2021-09-12     2021-09-11      10                      50
5  2021-09-12     2021-09-11      10                      50
6  2021-09-11     2021-09-10       1                      50

but I'm not sure how to make it dynamic for each line item?

Upvotes: 2

Views: 334

Answers (2)

Roim
Roim

Reputation: 3066

in one line:

pd.merge(df, df.groupby('Date')['Clicks'].sum().rename('Total_Clicks_Yesterday'), how='left', left_on='Date_Yesterday', right_index=True)

Here I'm merging two dataframes: the original dataframe, and a second one which is the groupby (documentation).

The groupby is grouping over the current date, and summing it up:

df.groupby('Date')['Clicks'].sum().rename('Total_Clicks_Yesterday')
Out[16]: 
Date
2021-09-11      1
2021-09-12     20
2021-09-13     50
2021-09-14    300
Name: Total_Clicks_Yesterday, dtype: int64

It's the easiest way to get a series as you desired, grouping and summing over a value. Although the name of the index is "date", I'm going to merge it as if it is the date of yesterday.

Now all we need to get this small series together with the original dataframe. I recommend taking a look at Pandas Merging 101 to understand it better. Basically, I'm merging based on the left dataframe (df), where I'm comparing the 'Date_Yesterday' column of df with the index of the groupby df (which to I refer to as "data of yesterday")

The result:

pd.merge(df, df.groupby('Date')['Clicks'].sum().rename('Total_Clicks_Yesterday'), how='left', left_on='Date_Yesterday', right_index=True)
Out[17]: 
         Date Date_Yesterday  Clicks  Total_Clicks_Yesterday
0  2021-09-14     2021-09-13     100                    50.0
1  2021-09-14     2021-09-13     100                    50.0
2  2021-09-14     2021-09-13     100                    50.0
3  2021-09-13     2021-09-12      50                    20.0
4  2021-09-12     2021-09-11      10                     1.0
5  2021-09-12     2021-09-11      10                     1.0
6  2021-09-11     2021-09-10       1                     NaN

Upvotes: 2

Ben.T
Ben.T

Reputation: 29635

you can groupby the column Date and sum the Clicks to get the number of clicks each day. Then use map on the column Date_yesterday with the result of the groupby operation to align the number of clicks with the previous day

df['Total_Clicks_Yesterday'] = df['Date_Yesterday'].map(df.groupby('Date')['Clicks'].sum())
print(df)
         Date Date_Yesterday  Clicks  Total_Clicks_Yesterday
0  2021-09-14     2021-09-13     100                    50.0
1  2021-09-14     2021-09-13     100                    50.0
2  2021-09-14     2021-09-13     100                    50.0
3  2021-09-13     2021-09-12      50                    20.0
4  2021-09-12     2021-09-11      10                     1.0
5  2021-09-12     2021-09-11      10                     1.0
6  2021-09-11     2021-09-10       1                     NaN

Upvotes: 7

Related Questions