Reputation: 3477
I have a dataframe like:
data = {'year': [2020, 2020, 2021, 2021], 'week': [52, 53, 1, 2]}
df = pd.DataFrame(data=data)
year week
0 2020 52
1 2020 53
2 2021 1
3 2021 2
and I would like to get for each line the Monday (date) of that period so something like:
year week period
0 2020 52 2020-12-21
1 2020 53 2020-12-28
2 2021 1 2021-01-04
3 2021 2 2021-01-11
What is the correct way to do this in pandas?
Upvotes: 1
Views: 285
Reputation: 2819
You can try with:
data = {'year': [2020, 2020, 2021, 2021], 'week': [52, 53, 1, 2]}
df = pd.DataFrame(data=data)
df['date combined']=df['year'].astype(str)+'-'+df['week'].astype(str)+ '-1'
df['date of Monday']=pd.to_datetime(df['date combined'], format='%Y-%W-%w')
print(df)
result:
year week date combined date
0 2020 52 2020-52-1 2020-12-28
1 2020 53 2020-53-1 2021-01-04
2 2021 1 2021-1-1 2021-01-04
3 2021 2 2021-2-1 2021-01-11
Upvotes: 0
Reputation: 3477
The right way to do this is to use ISO date format pattern:
df["period"] = pd.to_datetime(
df.year.astype(str) + '-W' + df.week.astype(str) + '-1',
format='%G-W%V-%u')\
.dt.strftime('%Y-%m-%d')
and not
df["period"] = pd.to_datetime(
df.year.astype(str) + '-W' + df.week.astype(str) + '-1',
format='%Y-W%W-%w')\
.dt.strftime('%Y-%m-%d')
Because of ISO week number (cf comments)
Upvotes: 1