Reputation: 3671
I know that I can manually select column names and find the sum across the row, but is there a way to sum all values that come at or after the start date? I want to only add to a "Sum" column if the column is a date at or after start_date
.
I have a pandas dataframe that looks like the following:
d = {'start_date': ['01/2015', '01/2015', '02/2015', '03/2015'],'2015-01': [1, 1, 2, 3], '2015-02': [2, 1, 0, 1], '2015-03': [0, 1, 2, 1]}
df = pd.DataFrame(data=d)
df
start_date 2015-01 2015-02 2015-03
0 01/2015 1 2 0
1 01/2015 1 1 1
2 02/2015 2 0 2
3 03/2015 3 1 1
I want to get the row-wise sum of the dated columns, from the start date for the row until the end. For example:
df
start_date 2015-01 2015-02 2015-03 Sum
0 01/2015 1 2 0 3
1 01/2015 1 1 1 3
2 02/2015 2 0 2 2
3 03/2015 3 1 1 1
Upvotes: 2
Views: 165
Reputation: 323226
1st you need convert the format of your date, then we can using numpy
broadcast get the condition match , then do sum
df.start_date=pd.to_datetime(df.start_date,format='%m/%Y').dt.strftime('%Y-%m')
s=df.start_date.values[:,None]<=df.columns[1:].values
df['Sum']=(df.iloc[:,1:]*s).sum(1)
df
Out[307]:
start_date 2015-01 2015-02 2015-03 Sum
0 2015-01 1 2 0 3
1 2015-01 1 1 1 3
2 2015-02 2 0 2 2
3 2015-03 3 1 1 1
Upvotes: 3