Mustard Tiger
Mustard Tiger

Reputation: 3671

Pandas sum data in columns occurring after start date

I know that I can manually select column names and find the sum across the row, but is there a way to sum all values that come at or after the start date? I want to only add to a "Sum" column if the column is a date at or after start_date.

I have a pandas dataframe that looks like the following:

d = {'start_date': ['01/2015', '01/2015', '02/2015', '03/2015'],'2015-01': [1, 1, 2, 3], '2015-02': [2, 1, 0, 1], '2015-03': [0, 1, 2, 1]}
df = pd.DataFrame(data=d)
df
   start_date   2015-01   2015-02   2015-03
0   01/2015         1        2         0
1   01/2015         1        1         1
2   02/2015         2        0         2
3   03/2015         3        1         1

I want to get the row-wise sum of the dated columns, from the start date for the row until the end. For example:

df
   start_date   2015-01   2015-02   2015-03   Sum
0   01/2015         1        2         0       3
1   01/2015         1        1         1       3
2   02/2015         2        0         2       2
3   03/2015         3        1         1       1

Upvotes: 2

Views: 165

Answers (1)

BENY
BENY

Reputation: 323226

1st you need convert the format of your date, then we can using numpy broadcast get the condition match , then do sum

df.start_date=pd.to_datetime(df.start_date,format='%m/%Y').dt.strftime('%Y-%m')

s=df.start_date.values[:,None]<=df.columns[1:].values
df['Sum']=(df.iloc[:,1:]*s).sum(1)
df
Out[307]: 
  start_date  2015-01  2015-02  2015-03  Sum
0    2015-01        1        2        0    3
1    2015-01        1        1        1    3
2    2015-02        2        0        2    2
3    2015-03        3        1        1    1

Upvotes: 3

Related Questions