Reputation: 1345
Assume that I have the following data set
table = [[datetime.datetime(2015, 1, 1), 1, 0.5],
[datetime.datetime(2015, 1, 27), 1, 0.5],
[datetime.datetime(2015, 1, 31), 1, 0.5],
[datetime.datetime(2015, 2, 1), 1, 2],
[datetime.datetime(2015, 2, 3), 1, 2],
[datetime.datetime(2015, 2, 15), 1, 2],
[datetime.datetime(2015, 2, 28), 1, 2],
[datetime.datetime(2015, 3, 1), 1, 3],
[datetime.datetime(2015, 3, 17), 1, 3],
[datetime.datetime(2015, 3, 31), 1, 3]]
df = pd.DataFrame(table, columns=['Date', 'Id', 'Value'])
Now, I'd like to find the last value of each month, move it monthwise to the values of the next month, and finally take the cumulative product of these values. Doing this procedure for the above data should result in (performing each step):
Finding the last entry of each month and moving them monthwise would result in
Date Id Value Temp
0 2015-01-01 1 0.5 NaN
1 2015-01-27 1 0.5 NaN
2 2015-01-31 1 0.5 NaN
3 2015-02-01 1 2.0 0.5
4 2015-02-03 1 2.0 0.5
5 2015-02-15 1 2.0 0.5
6 2015-02-28 1 2.0 0.5
7 2015-03-01 1 3.0 2.0
8 2015-03-17 1 3.0 2.0
9 2015-03-31 1 3.0 2.0
Filling the NaN
's with 1
, taking the cumulative product, and dropping temp
would result in
Date Id Value Result
0 2015-01-01 1 0.5 1
1 2015-01-27 1 0.5 1
2 2015-01-31 1 0.5 1
3 2015-02-01 1 2.0 0.5
4 2015-02-03 1 2.0 0.5
5 2015-02-15 1 2.0 0.5
6 2015-02-28 1 2.0 0.5
7 2015-03-01 1 3.0 1.0
8 2015-03-17 1 3.0 1.0
9 2015-03-31 1 3.0 1.0
I hope that this is clear enough. And if anyone wonders why on earth I'd like to do this is because I've got MTD-data, and it needs to be resampled. Thanks, Tingis.
edit The number of entries per month are "random", as in they can either be as long as the month or shorter (business data...)
Upvotes: 2
Views: 54
Reputation: 24742
The following code does not assume that you have only two rows for each month. The idea is that do the group-wise calculation first, and then populate some NaN using .reindex()
and fill those NaNs using backward fill as we've got the value for the very last entry of each month.
# your data
# ==================================
import pandas as pd
import datetime
table = [[datetime.datetime(2015, 1, 1), 1, 0.5],
[datetime.datetime(2015, 1, 31), 1, 0.5],
[datetime.datetime(2015, 2, 1), 1, 2],
[datetime.datetime(2015, 2, 28), 1, 2],
[datetime.datetime(2015, 3, 1), 1, 3],
[datetime.datetime(2015, 3, 31), 1, 3]]
df = pd.DataFrame(table, columns=['Date', 'Id', 'Value'])
# better to set Date column to index
df = df.set_index('Date')
print(df)
Id Value
Date
2015-01-01 1 0.5
2015-01-31 1 0.5
2015-02-01 1 2.0
2015-02-28 1 2.0
2015-03-01 1 3.0
2015-03-31 1 3.0
# processing
# =================================================
# get last entry from each month
df_temp = df.groupby(lambda idx: idx.month).tail(1)
# do the cumprod, reindex to have the same index as original df, backward fill
df['Result'] = df_temp['Value'].shift(1).fillna(1).cumprod().reindex(df.index).fillna(method='bfill')
print(df)
Id Value Result
Date
2015-01-01 1 0.5 1.0
2015-01-31 1 0.5 1.0
2015-02-01 1 2.0 0.5
2015-02-28 1 2.0 0.5
2015-03-01 1 3.0 1.0
2015-03-31 1 3.0 1.0
For the follow-ups question:
# your data
# ==================================
import pandas as pd
import datetime
table = [[datetime.datetime(2015, 1, 1), 1, 0.5],
[datetime.datetime(2015, 1, 27), 1, 0.5],
[datetime.datetime(2015, 1, 31), 1, 0.5],
[datetime.datetime(2015, 2, 1), 1, 2],
[datetime.datetime(2015, 2, 3), 1, 2],
[datetime.datetime(2015, 2, 15), 1, 2],
[datetime.datetime(2015, 2, 28), 1, 2],
[datetime.datetime(2015, 3, 1), 1, 3],
[datetime.datetime(2015, 3, 17), 1, 3],
[datetime.datetime(2015, 3, 31), 1, 3]]
df1 = pd.DataFrame(table, columns=['Date', 'Id', 'Value'])
df2 = df1.copy()
df2.Id = 2
df = df1.append(df2)
# better to set Date column to index
df = df.set_index('Date')
print(df)
Id Value
Date
2015-01-01 1 0.5
2015-01-27 1 0.5
2015-01-31 1 0.5
2015-02-01 1 2.0
2015-02-03 1 2.0
2015-02-15 1 2.0
2015-02-28 1 2.0
2015-03-01 1 3.0
2015-03-17 1 3.0
2015-03-31 1 3.0
2015-01-01 2 0.5
2015-01-27 2 0.5
2015-01-31 2 0.5
2015-02-01 2 2.0
2015-02-03 2 2.0
2015-02-15 2 2.0
2015-02-28 2 2.0
2015-03-01 2 3.0
2015-03-17 2 3.0
2015-03-31 2 3.0
def my_func(group):
# get last entry from each month
df_temp = group.groupby(lambda idx: idx.month).tail(1)
# do the cumprod, reindex to have the same index as original df
group['Result'] = df_temp['Value'].shift(1).fillna(1).cumprod().reindex(group.index).fillna(method='bfill')
return group
df.groupby('Id').apply(my_func)
Id Value Result
Date
2015-01-01 1 0.5 1.0
2015-01-27 1 0.5 1.0
2015-01-31 1 0.5 1.0
2015-02-01 1 2.0 0.5
2015-02-03 1 2.0 0.5
2015-02-15 1 2.0 0.5
2015-02-28 1 2.0 0.5
2015-03-01 1 3.0 1.0
2015-03-17 1 3.0 1.0
2015-03-31 1 3.0 1.0
2015-01-01 2 0.5 1.0
2015-01-27 2 0.5 1.0
2015-01-31 2 0.5 1.0
2015-02-01 2 2.0 0.5
2015-02-03 2 2.0 0.5
2015-02-15 2 2.0 0.5
2015-02-28 2 2.0 0.5
2015-03-01 2 3.0 1.0
2015-03-17 2 3.0 1.0
2015-03-31 2 3.0 1.0
Upvotes: 1