Reputation: 69
I have a csv-file: https://data.rivm.nl/covid-19/COVID-19_aantallen_gemeente_per_dag.csv
I want to use it to provide insight into the corona deaths per week.
df = pd.read_csv("covid.csv", error_bad_lines=False, sep=";") df = df.loc[df['Deceased'] > 0] df["Date_of_publication"] = pd.to_datetime(df["Date_of_publication"]) df["Week"] = df["Date_of_publication"].dt.isocalendar().week df["Year"] = df["Date_of_publication"].dt.year df = df[["Week", "Year", "Municipality_name", "Deceased"]] df = df.groupby(by=["Week", "Year", "Municipality_name"]).agg({"Deceased" : "sum"}) df = df.sort_values(by=["Year", "Week"]) print(df)
Everything seems to be working fine except for the first 3 days of 2021. The first 3 days of 2021 are part of the last week (53) of 2020: http://week-number.net/calendar-with-week-numbers-2021.html.
When I print the dataframe this is the result:
53 2021 Winterswijk 1 Woudenberg 1 Zaanstad 1 Zeist 2 Zutphen 1
So basically what I'm looking for is a way where this line returns the year of the week number and not the year of the date:
df["Year"] = df["Date_of_publication"].dt.year
Upvotes: 3
Views: 1118
Reputation: 23217
You can use dt.isocalendar().year
to setup df["Year"]
:
df["Year"] = df["Date_of_publication"].dt.isocalendar().year
You will get year 2020 for date of 2021-01-01 but will get back to year 2021 for date of 2021-01-04 by this.
This is just similar to how you used dt.isocalendar().week
for setting up df["Week"]
. Since they are both basing on the same tuple (year, week, day)
returned by dt.isocalendar()
, they would always be in sync.
date_s = pd.Series(pd.date_range(start='2021-01-01', periods=5, freq='1D'))
date_s
0
0 2021-01-01
1 2021-01-02
2 2021-01-03
3 2021-01-04
4 2021-01-05
date_s.dt.isocalendar()
year week day
0 2020 53 5
1 2020 53 6
2 2020 53 7
3 2021 1 1
4 2021 1 2
Upvotes: 4
Reputation: 494
You can simply subtract the two dates and then divide the days attribute of the timedelta object by 7.
For example, this is the current week we are on now.
time_delta = (dt.datetime.today() - dt.datetime(2021, 1, 1))
The output is a datetime timedelta object
datetime.timedelta(days=75, seconds=84904, microseconds=144959)
For your problem, you'd do something like this
time_delta = int((df["Date_of_publication"] - df["Year"].days / 7)
The output would be a number that is the current week since date_of_publication
Upvotes: 0