user6162407
user6162407

Reputation: 267

Sum of values with same month

data = {'dates': ['2010-01-29', '2011-06-14', '2012-01-18'], 'values': [4, 3, 8]}
df = pd.DataFrame(data)
df.set_index('dates')
df.index = df.index.astype('datetime64[ns]')

Having a dataframe where the index is a date, how would I go about adding a new column caled 'Month' that is the sum of all values for that month but doesn't "enter the future" as in saying that it only adds up days before its date.

This is what the column would look like.

'Month': [4, 3, 12] 

Upvotes: 2

Views: 9913

Answers (3)

Jason Brown
Jason Brown

Reputation: 374

apply is your friend here

def sum_from_months_prior(row, df):
    '''returns sum of values in row month, 
    from all dates in df prior to row date'''

    month = pd.to_datetime(row).month

    all_dates_prior = df[df.index <= row]
    same_month = all_dates_prior[all_dates_prior.index.month == month]

    return same_month["values"].sum()

data = {'dates': ['2010-01-29', '2011-06-14', '2012-01-18'], 'values': [4, 3, 8]}
df = pd.DataFrame(data)
df.set_index('dates', inplace = True)
df.index = pd.to_datetime(df.index)
df["dates"] = df.index
df.sort_index(inplace = True)

df["Month"] = df["dates"].apply(lambda row: sum_from_months_prior (row, df))
df.drop("dates", axis = 1, inplace = True)

desired df:

            values  Month
dates
2010-01-29       4      4
2011-06-14       3      3
2012-01-18       8     12

Upvotes: 2

piRSquared
piRSquared

Reputation: 294238

You can use pandas TimeGrouper

df.groupby(pd.TimeGrouper('M')).sum()

Upvotes: 6

Jeff
Jeff

Reputation: 2228

There are several ways you could do this. The first would be to resample to monthly using df.resample(...).sum().

You could also create a month column from the index, using df['month'] = df.index.month, then perform a groupby operation, df.groupby('month').sum() - which method is best depends on what you want to do with the data.

Upvotes: 0

Related Questions