Reputation: 538
I want to create a range of week of year in the format "YYYY-WW". My goal is to use this range to check several pandas data frames against it to fill missings. Here an example of one of the data frames I have to deal with. Call it "df":
df
Product Period Sales
A 2017-01 39
A 2017-02 40
A 2017-05 45
A 2017-07 28
The column "Period" in the original data frame indicates year and week of the year, and it is in Object format. Below the approach I tried:
import pandas as pd
min, max = df['Period'].min(), df['Period'].max()
range = pd.period_range(min, max, freq='W-SUN')
periods = [r.weekofyear for r in range]
And the result I obtained is:
periods
[53, 1, 2, 3, ...21, 22, 23, 24, 25, 26]
That is, Python (or at least, my Python code) treats the week of the original data as a month. But, despite that, it produces a sequence of weeks, and it sequence starts with 53, not 1, as I expected. If I drop the command "weekofyear" in the last row of my snipet, the results are as follows:
periods
Period('2016-12-26/2017-01-01', 'W-SUN'),
Period('2017-01-02/2017-01-08', 'W-SUN'),
Period('2017-01-09/2017-01-15', 'W-SUN'),
..........
Period('2017-06-12/2017-06-18', 'W-SUN'),
Period('2017-06-19/2017-06-25', 'W-SUN'),
Period('2017-06-26/2017-07-02', 'W-SUN')]
The results I expect to obtain would be something like that:
periods
[2017-01, 2017-02, 2017-03, 2017-04, 2017-05, 2017-06, 2017-07]
And, after using this to look my original data for missings and fill it with zeroes, my data frame without missings would become:
df
Product Period Sales
A 2017-01 39
A 2017-02 40
A 2017-03 0
A 2017-04 0
A 2017-05 45
A 2017-06 0
A 2017-07 28
I have consulted several posts like this [https://stackoverflow.com/questions/55222420/converting-year-and-week-of-year-columns-to-date-in-pandas][1], this https://stackoverflow.com/questions/64214117/creating-a-year-week-column-from-distinct-year-and-week-columns-in-pandas-datafr and this https://stackoverflow.com/questions/56528413/how-to-get-all-weeks-between-two-dates, but no avail. Much beter would be if a better solution to fill the misssings exists. Can someone help me on this? Thanks in advance.
Upvotes: 1
Views: 2427
Reputation: 260725
To convert to period, you can first convert to datetime:
df['period'] = (pd.to_datetime(df['Period']+'0', format='%Y-%W%w')
.dt.to_period('W-SUN')
)
Output:
Product Period Sales period
0 A 2017-01 39 2017-01-02/2017-01-08
1 A 2017-02 40 2017-01-09/2017-01-15
2 A 2017-05 45 2017-01-30/2017-02-05
3 A 2017-07 28 2017-02-13/2017-02-19
Now, if your goal is to fill the missing values, you don't need an actual period object.
You could reindex with the missing strings using:
s = pd.to_datetime(df['Period']+'0', format='%Y-%W%w')
idx = pd.date_range(s.min(), s.max(), freq='7d').strftime('%Y-%W')
# ['2017-01', '2017-02', '2017-03', '2017-04', '2017-05', '2017-06', '2017-07']
(df.set_index('Period')
.reindex(idx)
.fillna({'Sales':0}, downcast='infer')
.ffill()
.reset_index()
)
Output:
index Product Sales
0 2017-01 A 39
1 2017-02 A 40
2 2017-03 A 0
3 2017-04 A 0
4 2017-05 A 45
5 2017-06 A 0
6 2017-07 A 28
Upvotes: 1