Ramon
Ramon

Reputation: 538

How to create a period range with year and week of year in python?

I want to create a range of week of year in the format "YYYY-WW". My goal is to use this range to check several pandas data frames against it to fill missings. Here an example of one of the data frames I have to deal with. Call it "df":

    df
    Product    Period      Sales    
    A          2017-01     39   
    A          2017-02     40   
    A          2017-05     45   
    A          2017-07     28

The column "Period" in the original data frame indicates year and week of the year, and it is in Object format. Below the approach I tried:

import pandas as pd
min, max = df['Period'].min(), df['Period'].max()
range = pd.period_range(min, max, freq='W-SUN')
periods = [r.weekofyear for r in range]

And the result I obtained is:

periods
[53, 1, 2, 3, ...21, 22, 23, 24, 25, 26]

That is, Python (or at least, my Python code) treats the week of the original data as a month. But, despite that, it produces a sequence of weeks, and it sequence starts with 53, not 1, as I expected. If I drop the command "weekofyear" in the last row of my snipet, the results are as follows:

periods
Period('2016-12-26/2017-01-01', 'W-SUN'),
Period('2017-01-02/2017-01-08', 'W-SUN'),
Period('2017-01-09/2017-01-15', 'W-SUN'),
..........
Period('2017-06-12/2017-06-18', 'W-SUN'),
Period('2017-06-19/2017-06-25', 'W-SUN'),
Period('2017-06-26/2017-07-02', 'W-SUN')]

The results I expect to obtain would be something like that:

periods
[2017-01, 2017-02, 2017-03, 2017-04, 2017-05, 2017-06, 2017-07]

And, after using this to look my original data for missings and fill it with zeroes, my data frame without missings would become:

df
Product    Period      Sales    
A          2017-01     39   
A          2017-02     40 
A          2017-03      0
A          2017-04      0    
A          2017-05     45 
A          2017-06      0    
A          2017-07     28

I have consulted several posts like this [https://stackoverflow.com/questions/55222420/converting-year-and-week-of-year-columns-to-date-in-pandas][1], this https://stackoverflow.com/questions/64214117/creating-a-year-week-column-from-distinct-year-and-week-columns-in-pandas-datafr and this https://stackoverflow.com/questions/56528413/how-to-get-all-weeks-between-two-dates, but no avail. Much beter would be if a better solution to fill the misssings exists. Can someone help me on this? Thanks in advance.

Upvotes: 1

Views: 2427

Answers (1)

mozway
mozway

Reputation: 260725

To convert to period, you can first convert to datetime:

df['period'] = (pd.to_datetime(df['Period']+'0', format='%Y-%W%w')
                  .dt.to_period('W-SUN')
                )

Output:

  Product   Period  Sales                 period
0       A  2017-01     39  2017-01-02/2017-01-08
1       A  2017-02     40  2017-01-09/2017-01-15
2       A  2017-05     45  2017-01-30/2017-02-05
3       A  2017-07     28  2017-02-13/2017-02-19

Now, if your goal is to fill the missing values, you don't need an actual period object.

You could reindex with the missing strings using:

s = pd.to_datetime(df['Period']+'0', format='%Y-%W%w')

idx = pd.date_range(s.min(), s.max(), freq='7d').strftime('%Y-%W')
# ['2017-01', '2017-02', '2017-03', '2017-04', '2017-05', '2017-06', '2017-07']

(df.set_index('Period')
   .reindex(idx)
   .fillna({'Sales':0}, downcast='infer')
   .ffill()
   .reset_index()
 )

Output:

     index Product  Sales
0  2017-01       A     39
1  2017-02       A     40
2  2017-03       A      0
3  2017-04       A      0
4  2017-05       A     45
5  2017-06       A      0
6  2017-07       A     28

Upvotes: 1

Related Questions