T.S
T.S

Reputation: 51

Python - Dealing with Dates

I have a small requirement to get a date mapping. I have the following code :

This code takes in the year and week number and gets the "Friday" of that week as the starting date.

from datetime import datetime
myDate = "2020 23"
df = datetime.strptime(myDate + ' 5', "%Y %W %w")
df = datetime.date(df)

The below code takes the starting date from above and gets the dates for the next 7 days.

import datetime
start = df
# build a simple range
dates = [start + datetime.timedelta(days=d) for d in range(7)]
dates = [str(d) for d in dates]
dates_df = pd.DataFrame(dates,columns = ['dates'])

How can this be turned into the following table with the naming conventions?

Table

enter image description here

All help is appreciated!

Thank You!

Upvotes: 0

Views: 444

Answers (1)

Tim
Tim

Reputation: 2049

One thing to be aware of is that the datetime.datetime week number (what is represented by %Y in strptime & strftime) is 0-indexed. Whereas from your example data, you're using the 1-indexed version.

from datetime import datetime, timedelta
import pandas as pd

my_date = "2020 23"

start = datetime.strptime(my_date + " 5", "%Y %W %w").date() - timedelta(weeks=1)
dates = [start + timedelta(days=i) for i in range(14)]

date_strings = [d.strftime("%d-%m-%Y") for d in dates]
date_codes = ["{}{}_{}".format(*(d - timedelta(days=4)).isocalendar()) for d in dates]

dates_df = pd.DataFrame({"Year_Week": date_codes, "Date": date_strings})

There's quite a lot going on in some of that, so let's break it down:

Import required classes:

from datetime import datetime, timedelta
import pandas as pd

Set up array of required dates:

Firstly, we parse the input string, and we can immediately extract just the date component in the same line using .date():

datetime.strptime(my_date + " 5", "%Y %W %w").date()

As I mentioned above, the datetime week number is 0-indexed, so when we do strptime with %Y on 23, we get the 24th week. This means we then need to jump back by a week to get the day we actually wanted:

start = datetime.strptime(my_date + " 5", "%Y %W %w").date() - timedelta(weeks=1)

Finally we use the list comprehension as you had:

dates = [start + timedelta(days=i) for i in range(14)]

Create our DataFrame columns:

strftime() is the reverse of strptime(), and the format for your Date column is dd-mm-yyyy, which corresponds to the format string used here:

date_strings = [d.strftime("%d-%m-%Y") for d in dates]

The next line has the most going on at once:

  • Firstly, note that date and datetime objects have the isocalendar() method which returns a tuple of (ISO year, ISO week, ISO day). ISO weeks run from Monday=1 to Sunday=7, and weeks start numbering at 1, not 0.

  • Your "weeks" therefore exactly match but are shifted by 4 days to start with Friday=1. So each of your dates has the corresponding ISO week/day number of the day 4 days previously. So we shift your date back by 4 days and then extract the year/week/day numbers: d - timedelta(days=4)).isocalendar()

  • With "{}{}_{}".format() we set up a template to drop in the year/week/day values. Each pair of curly brackets {} indicates where each value passed to format() should be inserted into the string template. For example,

    "{}{}_{}".format(2020, 23, 4)

    would give us "202023_4", the code for 8th June 2020.

  • Using the * on the result from our .isocalendar() function call 'unpacks' the tuple to pass its values individually into format()

Putting it all together as a list comprehension, again using the list of dates we created before:

date_codes = ["{}{}_{}".format(*(d - timedelta(days=4)).isocalendar()) for d in dates]

Building the DataFrame

We pass the data in as a dictionary in the format {"Column Name": column_values_list}:

dates_df = pd.DataFrame({"Year_Week": date_codes, "Date": date_strings})


We could wrap the whole thing up as a function, which would also mean we don't have to use a string as our starting point - we can just pass in the right numbers directly:

from datetime import date, timedelta
import pandas as pd

def create_table(year, week, n=1):
    start = date.fromisocalendar(year, week, 5)
    dates = [start + timedelta(days=i) for i in range(n * 7)]
    date_strings = [d.strftime("%d-%m-%Y") for d in dates]
    date_codes = ["{}{}_{}".format(*(d - timedelta(days=4)).isocalendar()) for d in dates]
    return pd.DataFrame({"Year_Week": date_codes, "Date": date_strings})

table = create_table(2020, 23, 2)
print(table)

Outputs:

   Year_Week        Date
0   202023_1  05-06-2020
1   202023_2  06-06-2020
2   202023_3  07-06-2020
3   202023_4  08-06-2020
4   202023_5  09-06-2020
5   202023_6  10-06-2020
6   202023_7  11-06-2020
7   202024_1  12-06-2020
8   202024_2  13-06-2020
9   202024_3  14-06-2020
10  202024_4  15-06-2020
11  202024_5  16-06-2020
12  202024_6  17-06-2020
13  202024_7  18-06-2020

Note that we have an optional third parameter n to say how many weeks we want to generate the table for (default is 1). Also, because we're passing the year and week number directly, we can use the built-in date.fromisocalendar() method, which is the reverse of the .isocalendar() method. This takes the year, week & day and returns the corresponding date directly.


Update: adaptions for Python 3.6 compatibility and flexible input

Getting start date in Python 3.6 without using .fromisocalendar()

date.fromisocalendar() was only introduced in Python 3.7, so if you're using an earlier version of Python you'll have to use the more complicated technique of writing out a string to then parse with strptime().

However, if you're using Python 3.6, there were some new formatting directives added for parsing ISO week dates that make this slightly easier and we can use the approach from this SO answer:

def date_from_isoweek(year, week, day):
    return datetime.strptime(f"{year:04d} {week:02d} {day:d}", "%G %V %u").date()

We're using an f-string to create the date string to then parse in as a datetime, from which we extract the date component. The e.g. :02d after week inside the braces {} ensures that it's formatted correctly as a 2-digit decimal left-padded with 0 (which we need if our week number is between 1-9).

Allowing input as a start and end date

This is quite easy, as there's a built-in pandas function called date_range() which accepts a start and end parameter which can either be date/datetime objects or strings. It's designed for creating a datetime index, but it's very easy to turn it into a list of dates.

dates = pd.date_range(start, end).date.tolist()

Putting it together

If we refactor our code to separate out the part that creates the list of dates we want in our table, and the part that then formats them to create the data for our columns and puts it into our dataframe, we get this:

def create_table_from_dates(dates):
    date_strings = [d.strftime("%d-%m-%Y") for d in dates]
    date_codes = [(d - timedelta(days=4)).strftime("%G%V_%u") for d in dates]
    return pd.DataFrame({"Year_Week": date_codes, "Date": date_strings})


def create_table_between_dates(start, end):
    dates = pd.date_range(start, end).date.tolist()
    return create_table_from_dates(dates)


def create_table_by_weeks(year, week, n=1):
    friday_as_isoweek_string = f"{year:04d} {week:02d} 5"
    start = datetime.strptime(friday_as_isoweek_string, "%G %V %u").date()
    dates = [start + timedelta(days=i) for i in range(n * 7)]
    return create_table_from_dates(dates)

table_by_weeks = create_table_by_weeks(2020, 23, 2)
table_from_range = create_table_between_dates("2020-06-05", "2020-06-28")

The create_table_by_weeks() has the same signature as our create_table() function from the original answer. The create_table_between_dates() accepts a start and end date, either as date objects or strings. Both of these functions produce the list of dates for the table, and then pass them on to the create_table_from_dates() function (at the top) to create the DataFrame.

Changing the format of the output strings

The part of the code that determines what the Year_week column looks like is this line in the create_table_from_dates() function:

date_codes = [(d - timedelta(days=4)).strftime("%G%V_%u") for d in dates]

specifically the string "%G%V_%u" inside the strftime() method call. You can adjust this using the format codes set out in the table here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

Remember: the way we get our codes is by cheating slightly: because your "weeks" are just the ISO calendar weeks, but shifted to start on a Friday we just 'steal' the ISO week and day number from four days previously. If you're just playing around with the order or extra characters that's fine: changing "%G%V_%u" to "%u_%G%V" will change 202023_1 to 1_202023. But if you wanted to include things like the actual date or weekday, you'd have to make sure you got those components from the true date (not the date 4 days previously).

date_codes = [
    (d - timedelta(days=4)).strftime("%G%V_%u") + d.strftime(" %a %d %b")
    for d in dates
]

would give us dates like 202023_1 Fri 05 Jun

If it is only the year/week/day you want to work with, we can extract this format string as a variable fmt, and pass it into create_table_from_dates() from the two other functions, and make it a keyword argument (with a default) for both of these:

def create_table_from_dates(dates, fmt):
    date_strings = [d.strftime("%d-%m-%Y") for d in dates]
    date_codes = [(d - timedelta(days=4)).strftime(fmt) for d in dates]
    return pd.DataFrame({"Year_Week": date_codes, "Date": date_strings})


def create_table_between_dates(start, end, fmt="%G%V_%u"):
    dates = pd.date_range(start, end).date.tolist()
    return create_table_from_dates(dates, fmt)


def create_table_by_weeks(year, week, n=1, fmt="%G%V_%u"):
    friday_as_isoweek_string = f"{year:04d} {week:02d} 5"
    start = datetime.strptime(friday_as_isoweek_string, "%G %V %u").date()
    dates = [start + timedelta(days=i) for i in range(n * 7)]
    return create_table_from_dates(dates, fmt)


table = create_table_by_weeks(2020, 23, 2, fmt="%u_%G%V")
print(table)

Will give this output:

   Year_Week        Date
0   1_202023  05-06-2020
1   2_202023  06-06-2020
2   3_202023  07-06-2020
3   4_202023  08-06-2020
4   5_202023  09-06-2020
5   6_202023  10-06-2020
6   7_202023  11-06-2020
7   1_202024  12-06-2020
8   2_202024  13-06-2020
9   3_202024  14-06-2020
10  4_202024  15-06-2020
11  5_202024  16-06-2020
12  6_202024  17-06-2020
13  7_202024  18-06-2020

Upvotes: 2

Related Questions