Reputation: 6020
I have a dataset that looks like the following:
YR_FW YIELD
0 201401 12.3
1 201402 10.2
2 201403 7.2
3 201404 8.0
4 201405 1.2
... .... ....
96 201446 102.3
97 201447 101.7
98 201448 89.5
99 201449 72.2
100 201450 88.0
101 201451 98.89
I would like to aggregate (sum) such that I have 12 months. It may look something like this:
Months Summed_YLD
1 value
2 value
... ...
11 value
12 value
Note: The YR_FW at times have missing values. For example 201427
is not in the data frame.
Note2: If cases overlap, they should be aggregated to ending month. The week starts with Monday and ends with Sunday (this is not a 5-day week).
Upvotes: 2
Views: 1485
Reputation:
Unfortunately, I don't have enough data to fully check the code, but this is what I arrived to right now. If we suppose that we have:
import pandas as pd
import numpy as np
df = pd.DataFrame({"YR_FW":[201401,201402,201403,201405,201506],"YIELD":[12.3,10.2,7.2,1.2,3.8]})
YR_FW YIELD
0 201401 12.3
1 201402 10.2
2 201403 7.2
3 201405 1.2
4 201506 3.8
with the 3rd row missing to conform to your actual data, and a different year added. We could proceed as follow:
df_pd_range = pd.period_range("01/01/2014","02/07/2016", freq="W") #Here you place the period of your data, I elongated till 2016 to test code
df.YR_FW = df.YR_FW.astype(str).map(lambda a_: a_[:4] + "-" + a_[4:])
a_ = [np.logical_and(df_pd_range.year == int(df.YR_FW.iloc[i][:4]),df_pd_range.week==int(df.YR_FW.iloc[i][5:])) for i in range(len(df))] #choose only the period that is present in the data
b_ = [df_pd_range[i][0] for i in a_]
arrays = [[i.year for i in b_],[i.month for i in b_] ]
index = pd.MultiIndex.from_arrays(arrays, names=["year", "month"])
df.set_index(index, inplace=True)
df.groupby(level=[0,1]).mean() #Here you obtain mean data grouped by week and year
Tell me how it goes
Upvotes: 0
Reputation: 375445
I would convert these to datetimes first:
In [11]: df["YR_FW"] = pd.to_datetime(df["YR_FW"].astype("str") + "-0", format="%Y%W-%w")
In [12]: df
Out[12]:
YR_FW YIELD
0 2014-01-12 12.3
1 2014-01-19 10.2
2 2014-01-26 7.2
3 2014-02-02 8.0
4 2014-02-09 1.2
Note that the %W
specifies that the week starts on Monday, and the '-0'
specifies to take the Sunday within the week. So, the resulting dates would be the last day of the week specified.
Now you can use the dt
accessor to extract the last day of the week's month:
In [13]: df.groupby(df["YR_FW"].dt.month).sum()
Out[13]:
YIELD
YR_FW
1 29.7
2 9.2
Upvotes: 3