r0f1
r0f1

Reputation: 3096

Pandas parse week numbers

Consider the following file test.csv:

"Time","RegionCode","RegionName","NumValue"
"2009-W40","AT","Austria",0
"2009-W40","BE","Belgium",54
"2009-W40","BG","Bulgaria",0
"2009-W40","CZ","Czech Republic",1

I'd like to parse the date which is stored in the first column and would like to create a dataframe like so:

parser = lambda x: pd.datetime.strptime(x, "%Y-W%W")
df = pd.read_csv("test.csv", parse_dates=["Time"], date_parser=parser)

Result:

    Time    RegionCode  RegionName  NumValue
0   2009-01-01  AT  Austria 0
1   2009-01-01  BE  Belgium 54
2   2009-01-01  BG  Bulgaria    0
3   2009-01-01  CZ  Czech Republic  1

However, the resulting time column is not correct. All I get is "2019-01-01" and this is certainly not the 40th week of the year. Am I doing something wrong? Anybody else had this issue when parsing weeks?

Upvotes: 0

Views: 308

Answers (2)

KenHBS
KenHBS

Reputation: 7164

You are almost correct. The only problem is that from a week number and year, you cannot determine a specific date. The trick is to just add day of the week as 1.

I would recommend sticking with pd.to_datetime() like you tried initially and supplying a date-format string. That should work out fine with the added 1:

pd.to_datetime(df['Time'] + '-1', format='%Y-W%W-%w')
# 0   2009-10-05
# 1   2009-10-05
# 2   2009-10-05
# 3   2009-10-05

Upvotes: 2

Henry Yik
Henry Yik

Reputation: 22493

I am not sure if you can parse it directly upon read_csv, but you can certainly do it after:

import pandas as pd

test = [
["2009-W40","AT","Austria",0],
["2009-W40","BE","Belgium",54],
["2009-W40","BG","Bulgaria",0],
["2009-W40","CZ","Czech Republic",1]]

df = pd.DataFrame(test,columns=["Time","RegionCode","RegionName","NumValue"])

df["Time"] = pd.to_datetime(df["Time"].str[:4],format="%Y") + \
             pd.to_timedelta(df["Time"].str[-2:].astype(int).mul(7),unit="days")

print (df)

#
        Time RegionCode      RegionName  NumValue
0 2009-10-08         AT         Austria         0
1 2009-10-08         BE         Belgium        54
2 2009-10-08         BG        Bulgaria         0
3 2009-10-08         CZ  Czech Republic         1

Upvotes: 1

Related Questions