Reputation: 3096
Consider the following file test.csv
:
"Time","RegionCode","RegionName","NumValue"
"2009-W40","AT","Austria",0
"2009-W40","BE","Belgium",54
"2009-W40","BG","Bulgaria",0
"2009-W40","CZ","Czech Republic",1
I'd like to parse the date which is stored in the first column and would like to create a dataframe like so:
parser = lambda x: pd.datetime.strptime(x, "%Y-W%W")
df = pd.read_csv("test.csv", parse_dates=["Time"], date_parser=parser)
Result:
Time RegionCode RegionName NumValue
0 2009-01-01 AT Austria 0
1 2009-01-01 BE Belgium 54
2 2009-01-01 BG Bulgaria 0
3 2009-01-01 CZ Czech Republic 1
However, the resulting time column is not correct. All I get is "2019-01-01" and this is certainly not the 40th week of the year. Am I doing something wrong? Anybody else had this issue when parsing weeks?
Upvotes: 0
Views: 308
Reputation: 7164
You are almost correct. The only problem is that from a week number and year, you cannot determine a specific date. The trick is to just add day of the week as 1.
I would recommend sticking with pd.to_datetime()
like you tried initially and supplying a date-format string. That should work out fine with the added 1:
pd.to_datetime(df['Time'] + '-1', format='%Y-W%W-%w')
# 0 2009-10-05
# 1 2009-10-05
# 2 2009-10-05
# 3 2009-10-05
Upvotes: 2
Reputation: 22493
I am not sure if you can parse it directly upon read_csv
, but you can certainly do it after:
import pandas as pd
test = [
["2009-W40","AT","Austria",0],
["2009-W40","BE","Belgium",54],
["2009-W40","BG","Bulgaria",0],
["2009-W40","CZ","Czech Republic",1]]
df = pd.DataFrame(test,columns=["Time","RegionCode","RegionName","NumValue"])
df["Time"] = pd.to_datetime(df["Time"].str[:4],format="%Y") + \
pd.to_timedelta(df["Time"].str[-2:].astype(int).mul(7),unit="days")
print (df)
#
Time RegionCode RegionName NumValue
0 2009-10-08 AT Austria 0
1 2009-10-08 BE Belgium 54
2 2009-10-08 BG Bulgaria 0
3 2009-10-08 CZ Czech Republic 1
Upvotes: 1