Reputation: 51
I have a small requirement to get a date mapping. I have the following code :
This code takes in the year and week number and gets the "Friday" of that week as the starting date.
from datetime import datetime
myDate = "2020 23"
df = datetime.strptime(myDate + ' 5', "%Y %W %w")
df = datetime.date(df)
The below code takes the starting date from above and gets the dates for the next 7 days.
import datetime
start = df
# build a simple range
dates = [start + datetime.timedelta(days=d) for d in range(7)]
dates = [str(d) for d in dates]
dates_df = pd.DataFrame(dates,columns = ['dates'])
How can this be turned into the following table with the naming conventions?
Table
All help is appreciated!
Thank You!
Upvotes: 0
Views: 444
Reputation: 2049
One thing to be aware of is that the datetime.datetime
week number (what is represented by %Y
in strptime
& strftime
) is 0-indexed. Whereas from your example data, you're using the 1-indexed version.
from datetime import datetime, timedelta
import pandas as pd
my_date = "2020 23"
start = datetime.strptime(my_date + " 5", "%Y %W %w").date() - timedelta(weeks=1)
dates = [start + timedelta(days=i) for i in range(14)]
date_strings = [d.strftime("%d-%m-%Y") for d in dates]
date_codes = ["{}{}_{}".format(*(d - timedelta(days=4)).isocalendar()) for d in dates]
dates_df = pd.DataFrame({"Year_Week": date_codes, "Date": date_strings})
There's quite a lot going on in some of that, so let's break it down:
from datetime import datetime, timedelta
import pandas as pd
Firstly, we parse the input string, and we can immediately extract just the date component in the same line using .date()
:
datetime.strptime(my_date + " 5", "%Y %W %w").date()
As I mentioned above, the datetime
week number is 0-indexed, so when we do strptime
with %Y
on 23
, we get the 24th week. This means we then need to jump back by a week to get the day we actually wanted:
start = datetime.strptime(my_date + " 5", "%Y %W %w").date() - timedelta(weeks=1)
Finally we use the list comprehension as you had:
dates = [start + timedelta(days=i) for i in range(14)]
strftime()
is the reverse of strptime()
, and the format for your Date
column is dd-mm-yyyy
, which corresponds to the format string used here:
date_strings = [d.strftime("%d-%m-%Y") for d in dates]
The next line has the most going on at once:
Firstly, note that date
and datetime
objects have the isocalendar()
method which returns a tuple of (ISO year, ISO week, ISO day). ISO weeks run from Monday=1 to Sunday=7, and weeks start numbering at 1, not 0.
Your "weeks" therefore exactly match but are shifted by 4 days to start with Friday=1. So each of your dates has the corresponding ISO week/day number of the day 4 days previously. So we shift your date back by 4 days and then extract the year/week/day numbers: d - timedelta(days=4)).isocalendar()
With "{}{}_{}".format()
we set up a template to drop in the year/week/day values. Each pair of curly brackets {}
indicates where each value passed to format()
should be inserted into the string template. For example,
"{}{}_{}".format(2020, 23, 4)
would give us "202023_4"
, the code for 8th June 2020.
Using the *
on the result from our .isocalendar()
function call 'unpacks' the tuple to pass its values individually into format()
Putting it all together as a list comprehension, again using the list of dates we created before:
date_codes = ["{}{}_{}".format(*(d - timedelta(days=4)).isocalendar()) for d in dates]
We pass the data in as a dictionary in the format {"Column Name": column_values_list}
:
dates_df = pd.DataFrame({"Year_Week": date_codes, "Date": date_strings})
We could wrap the whole thing up as a function, which would also mean we don't have to use a string as our starting point - we can just pass in the right numbers directly:
from datetime import date, timedelta
import pandas as pd
def create_table(year, week, n=1):
start = date.fromisocalendar(year, week, 5)
dates = [start + timedelta(days=i) for i in range(n * 7)]
date_strings = [d.strftime("%d-%m-%Y") for d in dates]
date_codes = ["{}{}_{}".format(*(d - timedelta(days=4)).isocalendar()) for d in dates]
return pd.DataFrame({"Year_Week": date_codes, "Date": date_strings})
table = create_table(2020, 23, 2)
print(table)
Outputs:
Year_Week Date
0 202023_1 05-06-2020
1 202023_2 06-06-2020
2 202023_3 07-06-2020
3 202023_4 08-06-2020
4 202023_5 09-06-2020
5 202023_6 10-06-2020
6 202023_7 11-06-2020
7 202024_1 12-06-2020
8 202024_2 13-06-2020
9 202024_3 14-06-2020
10 202024_4 15-06-2020
11 202024_5 16-06-2020
12 202024_6 17-06-2020
13 202024_7 18-06-2020
Note that we have an optional third parameter n
to say how many weeks we want to generate the table for (default is 1). Also, because we're passing the year and week number directly, we can use the built-in date.fromisocalendar()
method, which is the reverse of the .isocalendar()
method. This takes the year, week & day and returns the corresponding date directly.
.fromisocalendar()
date.fromisocalendar()
was only introduced in Python 3.7, so if you're using an earlier version of Python you'll have to use the more complicated technique of writing out a string to then parse with strptime()
.
However, if you're using Python 3.6, there were some new formatting directives added for parsing ISO week dates that make this slightly easier and we can use the approach from this SO answer:
def date_from_isoweek(year, week, day):
return datetime.strptime(f"{year:04d} {week:02d} {day:d}", "%G %V %u").date()
We're using an f-string to create the date string to then parse in as a datetime
, from which we extract the date
component. The e.g. :02d
after week
inside the braces {}
ensures that it's formatted correctly as a 2-digit decimal left-padded with 0
(which we need if our week number is between 1-9).
This is quite easy, as there's a built-in pandas function called date_range()
which accepts a start
and end
parameter which can either be date
/datetime
objects or strings. It's designed for creating a datetime index, but it's very easy to turn it into a list of dates.
dates = pd.date_range(start, end).date.tolist()
If we refactor our code to separate out the part that creates the list of dates we want in our table, and the part that then formats them to create the data for our columns and puts it into our dataframe, we get this:
def create_table_from_dates(dates):
date_strings = [d.strftime("%d-%m-%Y") for d in dates]
date_codes = [(d - timedelta(days=4)).strftime("%G%V_%u") for d in dates]
return pd.DataFrame({"Year_Week": date_codes, "Date": date_strings})
def create_table_between_dates(start, end):
dates = pd.date_range(start, end).date.tolist()
return create_table_from_dates(dates)
def create_table_by_weeks(year, week, n=1):
friday_as_isoweek_string = f"{year:04d} {week:02d} 5"
start = datetime.strptime(friday_as_isoweek_string, "%G %V %u").date()
dates = [start + timedelta(days=i) for i in range(n * 7)]
return create_table_from_dates(dates)
table_by_weeks = create_table_by_weeks(2020, 23, 2)
table_from_range = create_table_between_dates("2020-06-05", "2020-06-28")
The create_table_by_weeks()
has the same signature as our create_table()
function from the original answer. The create_table_between_dates()
accepts a start
and end
date, either as date objects or strings. Both of these functions produce the list of dates for the table, and then pass them on to the create_table_from_dates()
function (at the top) to create the DataFrame.
The part of the code that determines what the Year_week
column looks like is this line in the create_table_from_dates()
function:
date_codes = [(d - timedelta(days=4)).strftime("%G%V_%u") for d in dates]
specifically the string "%G%V_%u"
inside the strftime()
method call. You can adjust this using the format codes set out in the table here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
Remember: the way we get our codes is by cheating slightly: because your "weeks" are just the ISO calendar weeks, but shifted to start on a Friday we just 'steal' the ISO week and day number from four days previously. If you're just playing around with the order or extra characters that's fine: changing
"%G%V_%u"
to"%u_%G%V"
will change202023_1
to1_202023
. But if you wanted to include things like the actual date or weekday, you'd have to make sure you got those components from the true date (not the date 4 days previously).date_codes = [ (d - timedelta(days=4)).strftime("%G%V_%u") + d.strftime(" %a %d %b") for d in dates ]
would give us dates like
202023_1 Fri 05 Jun
If it is only the year/week/day you want to work with, we can extract this format string as a variable fmt
, and pass it into create_table_from_dates()
from the two other functions, and make it a keyword argument (with a default) for both of these:
def create_table_from_dates(dates, fmt):
date_strings = [d.strftime("%d-%m-%Y") for d in dates]
date_codes = [(d - timedelta(days=4)).strftime(fmt) for d in dates]
return pd.DataFrame({"Year_Week": date_codes, "Date": date_strings})
def create_table_between_dates(start, end, fmt="%G%V_%u"):
dates = pd.date_range(start, end).date.tolist()
return create_table_from_dates(dates, fmt)
def create_table_by_weeks(year, week, n=1, fmt="%G%V_%u"):
friday_as_isoweek_string = f"{year:04d} {week:02d} 5"
start = datetime.strptime(friday_as_isoweek_string, "%G %V %u").date()
dates = [start + timedelta(days=i) for i in range(n * 7)]
return create_table_from_dates(dates, fmt)
table = create_table_by_weeks(2020, 23, 2, fmt="%u_%G%V")
print(table)
Will give this output:
Year_Week Date
0 1_202023 05-06-2020
1 2_202023 06-06-2020
2 3_202023 07-06-2020
3 4_202023 08-06-2020
4 5_202023 09-06-2020
5 6_202023 10-06-2020
6 7_202023 11-06-2020
7 1_202024 12-06-2020
8 2_202024 13-06-2020
9 3_202024 14-06-2020
10 4_202024 15-06-2020
11 5_202024 16-06-2020
12 6_202024 17-06-2020
13 7_202024 18-06-2020
Upvotes: 2