Reputation: 267
data = {'dates': ['2010-01-29', '2011-06-14', '2012-01-18'], 'values': [4, 3, 8]}
df = pd.DataFrame(data)
df.set_index('dates')
df.index = df.index.astype('datetime64[ns]')
Having a dataframe where the index is a date, how would I go about adding a new column caled 'Month' that is the sum of all values for that month but doesn't "enter the future" as in saying that it only adds up days before its date.
This is what the column would look like.
'Month': [4, 3, 12]
Upvotes: 2
Views: 9913
Reputation: 374
apply
is your friend here
def sum_from_months_prior(row, df):
'''returns sum of values in row month,
from all dates in df prior to row date'''
month = pd.to_datetime(row).month
all_dates_prior = df[df.index <= row]
same_month = all_dates_prior[all_dates_prior.index.month == month]
return same_month["values"].sum()
data = {'dates': ['2010-01-29', '2011-06-14', '2012-01-18'], 'values': [4, 3, 8]}
df = pd.DataFrame(data)
df.set_index('dates', inplace = True)
df.index = pd.to_datetime(df.index)
df["dates"] = df.index
df.sort_index(inplace = True)
df["Month"] = df["dates"].apply(lambda row: sum_from_months_prior (row, df))
df.drop("dates", axis = 1, inplace = True)
desired df:
values Month
dates
2010-01-29 4 4
2011-06-14 3 3
2012-01-18 8 12
Upvotes: 2
Reputation: 294238
You can use pandas TimeGrouper
df.groupby(pd.TimeGrouper('M')).sum()
Upvotes: 6
Reputation: 2228
There are several ways you could do this. The first would be to resample to monthly using df.resample(...).sum()
.
You could also create a month column from the index, using df['month'] = df.index.month
, then perform a groupby operation, df.groupby('month').sum()
- which method is best depends on what you want to do with the data.
Upvotes: 0