Abhay kumar
Abhay kumar

Reputation: 225

How to get a date from year, month, week of month and Day of week in Pandas?

I have a Pandas dataframe, which looks like below

enter image description here

I want to create a new column, which tells the exact date from the information from all the above columns. The code should look something like this:

df['Date'] = pd.to_datetime(df['Month']+df['WeekOfMonth']+df['DayOfWeek']+df['Year'])

Upvotes: 2

Views: 1519

Answers (2)

SeaBean
SeaBean

Reputation: 23217

Let's approach it in 3 steps as follows:

  1. Get the date of month start Month_Start from Year and Month
  2. Calculate the date offsets DateOffset relative to Month_Start from WeekOfMonth and DayOfWeek
  3. Get the actual date Date from Month_Start and DateOffset

Here's the codes:

df['Month_Start'] = pd.to_datetime(df['Year'].astype(str) + df['Month'] + '01', format="%Y%b%d")

import time
df['DateOffset'] = (df['WeekOfMonth'] - 1) * 7 + df['DayOfWeek'].map(lambda x: time.strptime(x, '%A').tm_wday) - df['Month_Start'].dt.dayofweek

df['Date'] = df['Month_Start'] + pd.to_timedelta(df['DateOffset'], unit='D')

Output:

  Month  WeekOfMonth  DayOfWeek  Year Month_Start  DateOffset       Date
0   Dec            5  Wednesday  1995  1995-12-01          26 1995-12-27
1   Jan            3  Wednesday  2013  2013-01-01          15 2013-01-16
2   Oct            5     Friday  2018  2018-10-01          32 2018-11-02
3   Jun            2   Saturday  1980  1980-06-01           6 1980-06-07
4   Jan            5     Monday  1976  1976-01-01          25 1976-01-26

The Date column now contains the dates derived from the information from other columns.

You can remove the working interim columns, if you like, as follows:

df = df.drop(['Month_Start', 'DateOffset'], axis=1)

Upvotes: 0

Fran Verdejo
Fran Verdejo

Reputation: 79

I was able to find a workaround for your case. You will need to define the dictionaries for the months and the days of the week.

month = {"Jan":"01", "Feb":"02", "March":"03", "Apr": "04", "May":"05", "Jun":"06", "Jul":"07", "Aug":"08", "Sep":"09", "Oct":"10", "Nov":"11", "Dec":"12"}

week = {"Monday":1,"Tuesday":2,"Wednesday":3,"Thursday":4,"Friday":5,"Saturday":6,"Sunday":7}

With this dictionaries the transformation that I used with a custom dataframe was:

rows = [["Dec",5,"Wednesday", "1995"],
            ["Jan",3,"Wednesday","2013"]]
            
            
df = pd.DataFrame(rows, columns=["Month","Week","Weekday","Year"])

df['Date'] = (df["Year"] + "-" + df["Month"].map(month) + "-" + (df["Week"].apply(lambda x: (x - 1)*7) + df["Weekday"].map(week).apply(int) ).apply(str)).astype('datetime64[ns]')

However you have to be careful. With some data that you posted as example there were some dates that exceeds the date range. For example, for

row = ["Oct",5,"Friday","2018"]

The date displayed is 2018-10-33. I recommend using some logic to filter your data in order to avoid this kind of problems.

Upvotes: 1

Related Questions