Reputation: 137
I am trying to convert as set of monthly data points to a weekly basis but to attain that goal, I am breaking the data set down to daily and then aggregating it to the week level. While the aggregation is happening (through groupby), I am unable to breakdown the data into daily level.
Month_End_Date A B C D
2/28/2019 Pikachu Starter 100000 5302
2/28/2019 Jolteon Evolution 250000 7935
3/31/2019 Charmander Starter 62810 5103
3/31/2019 Bulbasaur Starter 16868 6035
4/30/2019 Flareon Evolution 62810 5103
4/30/2019 Eevee Starter 16868 6035
5/31/2019 Glaceon Evolution 62810 5103
5/31/2019 Leafeon Evolution 16868 6035
6/30/2019 Umbreon Evolution 62810 5103
6/30/2019 Espeon Evolution 16868 6035
I am trying to convert say the first row into
Month_End_Date A B C D
2/1/2019 Pikachu Starter 3571.428571 189.3571429
2/2/2019 Pikachu Starter 3571.428571 189.3571429
2/3/2019 Pikachu Starter 3571.428571 189.3571429
2/4/2019 Pikachu Starter 3571.428571 189.3571429
2/5/2019 Pikachu Starter 3571.428571 189.3571429
where the daily values have been divided by 28 (since the february month has 28 days)
I have searched ffill
amongst other things but unable to quite solve the problem
Upvotes: 4
Views: 2178
Reputation: 862581
First remove duplicates per column Month_End_Date
by DataFrame.drop_duplicates
, then DataFrame.resample
by forward filling missing values and last filter only 28
rows per month and year:
#convert column to datetimes and then to first day of month
df['Month_End_Date'] = (pd.to_datetime(df['Month_End_Date'], format='%m/%d/%Y')
.dt.to_period('m').dt.to_timestamp())
df = df.drop_duplicates('Month_End_Date').set_index('Month_End_Date')
#for duplicated last row of data
df.loc[df.index[-1] + pd.offsets.MonthEnd(1)] = df.iloc[-1]
df = df.resample('d').ffill()
df1 = df[df.groupby(df.index.to_period('m')).cumcount() < 28]
print (df1.tail())
A B C D
Month_End_Date
2019-06-24 Umbreon Evolution 62810 5103
2019-06-25 Umbreon Evolution 62810 5103
2019-06-26 Umbreon Evolution 62810 5103
2019-06-27 Umbreon Evolution 62810 5103
2019-06-28 Umbreon Evolution 62810 5103
If need all values, not only first per groups create helper column by counter with GroupBy.cumcount
and resample
chain with groupby
:
df['Month_End_Date'] = (pd.to_datetime(df['Month_End_Date'], format='%m/%d/%Y')
.dt.to_period('m').dt.to_timestamp())
df['g'] = df.groupby('Month_End_Date').cumcount()
df = df.set_index('Month_End_Date')
df.loc[df.index[-1] + pd.offsets.MonthEnd(1)] = df.iloc[-1]
df = df.groupby('g').resample('d').ffill().reset_index(level=0, drop=True)
df2 = df[df.groupby(['g', df.index.to_period('m')]).cumcount() < 28]
print (df2.tail())
A B C D g
Month_End_Date
2019-06-24 Espeon Evolution 16868 6035 1
2019-06-25 Espeon Evolution 16868 6035 1
2019-06-26 Espeon Evolution 16868 6035 1
2019-06-27 Espeon Evolution 16868 6035 1
2019-06-28 Espeon Evolution 16868 6035 1
Upvotes: 2