Rohit
Rohit

Reputation: 6020

Aggregating Fiscal Week data into monthly sum in pandas

I have a dataset that looks like the following:

      YR_FW    YIELD
0    201401     12.3
1    201402     10.2
2    201403     7.2
3    201404     8.0
4    201405     1.2
...  ....     ....
96   201446     102.3
97   201447     101.7
98   201448     89.5
99   201449     72.2
100  201450     88.0
101  201451     98.89

I would like to aggregate (sum) such that I have 12 months. It may look something like this:

Months    Summed_YLD
1            value
2            value
...         ...
11           value
12           value

Note: The YR_FW at times have missing values. For example 201427 is not in the data frame.

Note2: If cases overlap, they should be aggregated to ending month. The week starts with Monday and ends with Sunday (this is not a 5-day week).

Upvotes: 2

Views: 1485

Answers (2)

user4280261
user4280261

Reputation:

Unfortunately, I don't have enough data to fully check the code, but this is what I arrived to right now. If we suppose that we have:

import pandas as pd
import numpy as np

df = pd.DataFrame({"YR_FW":[201401,201402,201403,201405,201506],"YIELD":[12.3,10.2,7.2,1.2,3.8]})

     YR_FW    YIELD
0    201401     12.3
1    201402     10.2
2    201403     7.2
3    201405     1.2
4    201506     3.8

with the 3rd row missing to conform to your actual data, and a different year added. We could proceed as follow:

df_pd_range = pd.period_range("01/01/2014","02/07/2016", freq="W") #Here you place the period of your data, I elongated till 2016 to test code
df.YR_FW = df.YR_FW.astype(str).map(lambda a_: a_[:4] + "-" + a_[4:])
a_ = [np.logical_and(df_pd_range.year == int(df.YR_FW.iloc[i][:4]),df_pd_range.week==int(df.YR_FW.iloc[i][5:])) for i in range(len(df))] #choose only the period that is present in the data
b_ = [df_pd_range[i][0] for i in a_]
arrays = [[i.year for i in b_],[i.month for i in b_] ]
index = pd.MultiIndex.from_arrays(arrays, names=["year", "month"])
df.set_index(index, inplace=True)
df.groupby(level=[0,1]).mean() #Here you obtain mean data grouped by week and year

Tell me how it goes

Upvotes: 0

Andy Hayden
Andy Hayden

Reputation: 375445

I would convert these to datetimes first:

In [11]: df["YR_FW"] = pd.to_datetime(df["YR_FW"].astype("str") + "-0", format="%Y%W-%w")

In [12]: df
Out[12]:
       YR_FW  YIELD
0 2014-01-12   12.3
1 2014-01-19   10.2
2 2014-01-26    7.2
3 2014-02-02    8.0
4 2014-02-09    1.2

Note that the %W specifies that the week starts on Monday, and the '-0' specifies to take the Sunday within the week. So, the resulting dates would be the last day of the week specified.

Now you can use the dt accessor to extract the last day of the week's month:

In [13]: df.groupby(df["YR_FW"].dt.month).sum()
Out[13]:
       YIELD
YR_FW       
1       29.7
2        9.2

Upvotes: 3

Related Questions