Reputation: 1661
I have a dataframe with a start date and a finish date for several people :
# input df
df_input = pd.DataFrame([
["John", "2018-08-03", "2018-08-05"],
["Jack", "2018-08-20", "2018-08-21"]
])
df_input.columns = ["name", "start_day", "finish_day"]
I want to create a date range for every people (I want a pd.Series that contains date range) :
# output df
df_output = pd.DataFrame([
["John", "2018-08-03", "2018-08-05", "['2018-08-03', '2018-08-04', '2018-08-05']"],
["Jack", "2018-08-20", "2018-08-21", "['2018-08-20', '2018-08-21']"]
])
df_output.columns = ["name", "start_day", "finish_day", "date_range"]
I don't know how to create that range.
Any idea ?
Upvotes: 2
Views: 2055
Reputation: 446
Challanging and interesting one! I think the following snippet gets pretty close to what you are asking, though the shape is a little different from the exact output you requested. Yet, the restructured shape of the output does contain the range of dates, the name and the end date.
import pandas as pd
df_input = pd.DataFrame([["John", "2018-08-03", "2018-08-05"],["Jack", "2018-08-20", "2018-08-21"]], columns=['Name','Start_Date','End_Date'])
df_input['Start_Date'] = pd.to_datetime(df_input['Start_Date'], format='%Y-%m-%d')
df_input['End_Date'] = pd.to_datetime(df_input['End_Date'], format='%Y-%m-%d')
df_input.set_index('Start_Date', inplace=True)
def reindex_by_date(df_input):
dates = pd.date_range(df_input.index.min(), df_input['End_Date'].min())
return df_input.reindex(dates).ffill()
finaldf = df_input.groupby('Name').apply(reindex_by_date)
finaldf
Upvotes: 2