user52909
user52909

Reputation: 41

Pandas : How to sum column values over a date range

I am trying to sum the values of colA, over a date range based on "date" column, and store this rolling value in the new column "sum_col" But I am getting the sum of all rows (=100), not just those in the date range.

I can't use rolling or groupby by as my dates (in the real data) are not sequential (some days are missing)

Amy idea how to do this? Thanks.

# Create data frame
df = pd.DataFrame()

# Create datetimes and data
df['date'] = pd.date_range('1/1/2018', periods=100, freq='D')
df['colA']= 1
df['colB']= 2
df['colC']= 3


StartDate = df.date-  pd.to_timedelta(5, unit='D') 
EndDate= df.date

dfx=df
dfx['StartDate'] = StartDate
dfx['EndDate'] = EndDate

dfx['sum_col']=df[(df['date'] > StartDate) & (df['date'] <= EndDate)].sum()['colA']
dfx.head(50)

Upvotes: 4

Views: 10186

Answers (2)

iacob
iacob

Reputation: 24181

I'm not sure whether you want 3 columns for the sum of colA, colB, colC respectively, or one column which sums all three, but here is an example of how you would sum the values for colA:

dfx['colAsum'] = dfx.apply(lambda x: df.loc[(df.date >= x.StartDate) & 
                                            (df.date <= x.EndDate), 'colA'].sum(), axis=1)

e.g. (withperiods=10):

        date  colA  colB  colC  StartDate    EndDate  colAsum
0 2018-01-01     1     2     3 2017-12-27 2018-01-01        1
1 2018-01-02     1     2     3 2017-12-28 2018-01-02        2
2 2018-01-03     1     2     3 2017-12-29 2018-01-03        3
3 2018-01-04     1     2     3 2017-12-30 2018-01-04        4
4 2018-01-05     1     2     3 2017-12-31 2018-01-05        5
5 2018-01-06     1     2     3 2018-01-01 2018-01-06        6
6 2018-01-07     1     2     3 2018-01-02 2018-01-07        6
7 2018-01-08     1     2     3 2018-01-03 2018-01-08        6
8 2018-01-09     1     2     3 2018-01-04 2018-01-09        6
9 2018-01-10     1     2     3 2018-01-05 2018-01-10        6

Upvotes: 5

maurock
maurock

Reputation: 541

If what I understand is correct:

for i in range(df.shape[0]):

    dfx.loc[i,'sum_col']=df[(df['date'] > StartDate[i]) & (df['date'] <= EndDate[i])].sum()['colA']

For example, in range (2018-01-01, 2018-01-06) the sum is 6.

date    colA    colB    colC    StartDate   EndDate sum_col
0   2018-01-01  1   2   3   2017-12-27  2018-01-01  1.0
1   2018-01-02  1   2   3   2017-12-28  2018-01-02  2.0
2   2018-01-03  1   2   3   2017-12-29  2018-01-03  3.0
3   2018-01-04  1   2   3   2017-12-30  2018-01-04  4.0
4   2018-01-05  1   2   3   2017-12-31  2018-01-05  5.0
5   2018-01-06  1   2   3   2018-01-01  2018-01-06  5.0
6   2018-01-07  1   2   3   2018-01-02  2018-01-07  5.0 
7   2018-01-08  1   2   3   2018-01-03  2018-01-08  5.0
8   2018-01-09  1   2   3   2018-01-04  2018-01-09  5.0
9   2018-01-10  1   2   3   2018-01-05  2018-01-10  5.0
10  2018-01-11  1   2   3   2018-01-06  2018-01-11  5.0

Upvotes: 1

Related Questions