Charles R
Charles R

Reputation: 1661

Create a date range serie with Pandas

I have a dataframe with a start date and a finish date for several people :

# input df    
df_input = pd.DataFrame([
        ["John", "2018-08-03", "2018-08-05"],
        ["Jack", "2018-08-20", "2018-08-21"]
])
df_input.columns = ["name", "start_day", "finish_day"]

I want to create a date range for every people (I want a pd.Series that contains date range) :

# output df
df_output = pd.DataFrame([
    ["John", "2018-08-03", "2018-08-05", "['2018-08-03', '2018-08-04', '2018-08-05']"],
    ["Jack", "2018-08-20", "2018-08-21", "['2018-08-20', '2018-08-21']"]
])
df_output.columns = ["name", "start_day", "finish_day", "date_range"]

I don't know how to create that range.
Any idea ?

Upvotes: 2

Views: 2055

Answers (1)

Nicola
Nicola

Reputation: 446

Challanging and interesting one! I think the following snippet gets pretty close to what you are asking, though the shape is a little different from the exact output you requested. Yet, the restructured shape of the output does contain the range of dates, the name and the end date.

import pandas as pd
df_input = pd.DataFrame([["John", "2018-08-03", "2018-08-05"],["Jack", "2018-08-20", "2018-08-21"]], columns=['Name','Start_Date','End_Date'])
df_input['Start_Date'] =  pd.to_datetime(df_input['Start_Date'], format='%Y-%m-%d')
df_input['End_Date'] =  pd.to_datetime(df_input['End_Date'], format='%Y-%m-%d')
df_input.set_index('Start_Date', inplace=True)
def reindex_by_date(df_input):
    dates = pd.date_range(df_input.index.min(), df_input['End_Date'].min())
    return df_input.reindex(dates).ffill()
finaldf = df_input.groupby('Name').apply(reindex_by_date)
finaldf

Upvotes: 2

Related Questions