user6395790
user6395790

Reputation: 69

Python: Pandas dataframe get the year to which the week number belongs and not the year of the date

I have a csv-file: https://data.rivm.nl/covid-19/COVID-19_aantallen_gemeente_per_dag.csv

I want to use it to provide insight into the corona deaths per week.

  df = pd.read_csv("covid.csv", error_bad_lines=False, sep=";")
  df = df.loc[df['Deceased'] > 0]
  df["Date_of_publication"] = pd.to_datetime(df["Date_of_publication"])
  df["Week"] = df["Date_of_publication"].dt.isocalendar().week
  df["Year"] = df["Date_of_publication"].dt.year
  df = df[["Week", "Year", "Municipality_name", "Deceased"]]
  df = df.groupby(by=["Week", "Year", "Municipality_name"]).agg({"Deceased" : "sum"})
  df = df.sort_values(by=["Year", "Week"])
  print(df)

Everything seems to be working fine except for the first 3 days of 2021. The first 3 days of 2021 are part of the last week (53) of 2020: http://week-number.net/calendar-with-week-numbers-2021.html.

When I print the dataframe this is the result:

 53   2021 Winterswijk               1
           Woudenberg                1
           Zaanstad                  1
           Zeist                     2
           Zutphen                   1

So basically what I'm looking for is a way where this line returns the year of the week number and not the year of the date:

  df["Year"] = df["Date_of_publication"].dt.year

Upvotes: 3

Views: 1118

Answers (2)

SeaBean
SeaBean

Reputation: 23217

You can use dt.isocalendar().year to setup df["Year"]:

df["Year"] = df["Date_of_publication"].dt.isocalendar().year

You will get year 2020 for date of 2021-01-01 but will get back to year 2021 for date of 2021-01-04 by this.

This is just similar to how you used dt.isocalendar().week for setting up df["Week"]. Since they are both basing on the same tuple (year, week, day) returned by dt.isocalendar(), they would always be in sync.

Demo

date_s = pd.Series(pd.date_range(start='2021-01-01', periods=5, freq='1D'))

date_s

    0
0   2021-01-01
1   2021-01-02
2   2021-01-03
3   2021-01-04
4   2021-01-05

date_s.dt.isocalendar()

   year  week  day
0  2020    53    5
1  2020    53    6
2  2020    53    7
3  2021     1    1
4  2021     1    2

Upvotes: 4

unltd_J
unltd_J

Reputation: 494

You can simply subtract the two dates and then divide the days attribute of the timedelta object by 7.

For example, this is the current week we are on now.

time_delta = (dt.datetime.today() - dt.datetime(2021, 1, 1))

The output is a datetime timedelta object

datetime.timedelta(days=75, seconds=84904, microseconds=144959)

For your problem, you'd do something like this

time_delta = int((df["Date_of_publication"] - df["Year"].days / 7)

The output would be a number that is the current week since date_of_publication

Upvotes: 0

Related Questions