Reputation: 147
I'm trying to figure out how to expand a couple of rows with a date range in excel using Pandas. Below are two records that I want to expand in intervals of 7 days.
Below is what I expect to see as as output. Should there not be enough for 7 days then I still want a row that shows the remaining days left.
Below is a code that I drafted up. Quite new to using Pandas so I'm not sure if I'm using the methods correct here. If anyone would be able to help, that would be great!
df = pd.read_excel(path_link + input_file_name)
time_series = pd.DataFrame({
'Product': df.Product,
'Date':pd.date_range(df.Start_Date, df.End_Date)
})
EDIT
Big thank you to everyone below for responding! I learned a lot from hearing your responses. Below is the answer. Used Datanovice as the accepted answer.
#Data Frame
data = [
['Apple', '3/1/2019', '4/1/2019'],
['Pear', '2/5/2019', '3/4/2019' ]
]
df = pd.DataFrame(data, columns=['Product', 'Start_Date', 'End_Date'])
#Change data type for dates
df['Start_Date'] = pd.to_datetime(df['Start_Date'])
df['End_Date'] = pd.to_datetime(df['End_Date'])
#Un-pivot table and expands product's calendar dates for the start date
df2 = pd.melt(df, id_vars='Product').set_index('value')\
.groupby('Product').resample('8D').sum().drop(['variable','Product'],axis=1)\
.reset_index()
#Creates end date column
df2['end_date'] = df2['value'] + pd.DateOffset(days=7)
#Returns the index of the last product's end date; row's 3 and 7
idx = df2.drop_duplicates(subset='Product',keep='last').index
#Replace df2's row's 3 and 7 with the end date found in the original df
df2.loc[idx,'end_date'] = df2['Product'].map(df.set_index('Product')['End_Date'])
print(df2)
Upvotes: 2
Views: 330
Reputation: 23099
IIUC, one idea is to resample
with melt
to make your start & end dates into one column, and use pd.DateOffset
to assign your days, we still need to handle the max end date per product, we can do that with a simple map and .loc
assignment by finding the last product index by group by using .drop_duplicates
ensure both start & end_date are proper datetimes by using df['your_date_col'] = pd.to_datetime(df['your_date_col'])
df2 = pd.melt(df,id_vars='Product').set_index('value')\
.groupby('Product').resample('8D').sum().drop(['variable','Product'],axis=1)\
.reset_index()
df2['end_date'] = df2['value'] + pd.DateOffset(days=7)
idx = df2.drop_duplicates(subset='Product',keep='last').index
df2.loc[idx,'end_date'] = df2['Product'].map(df.set_index('Product')['End_Date'])
print(df2)
Product value end_date
0 Apple 2019-03-01 2019-03-08
1 Apple 2019-03-09 2019-03-16
2 Apple 2019-03-17 2019-03-24
3 Apple 2019-03-25 2019-04-01
4 Pear 2019-02-05 2019-02-12
5 Pear 2019-02-13 2019-02-20
6 Pear 2019-02-21 2019-02-28
7 Pear 2019-03-01 2019-03-04
Upvotes: 2
Reputation: 1795
This answer will handle the max end date per product as well:
df = pd.DataFrame({'Product' : ['Apple', 'Pear'], 'Start_Date' : ['3/1/2019', '2/5/2019'], 'End_Date' : ['4/1/2019', '3/4/2019']})
df2 = df.set_index('Product').stack()
df2 = df2.reset_index(name='dates')
df2['dates'] = pd.to_datetime(list(df2['dates']))
df3 = df2.set_index('dates').groupby('Product').resample('8D').bfill()
df3 = df3.reset_index('dates')
df3 = df3[['dates', 'Product']].reset_index(drop=True)
pd.merge(df3, df4, how='outer',on=['Product','dates']).drop_duplicates().sort_values(['Product','dates'])
dates Product
0 2019-03-01 Apple
1 2019-03-09 Apple
2 2019-03-17 Apple
3 2019-03-25 Apple
8 2019-04-01 Apple
4 2019-02-05 Pear
5 2019-02-13 Pear
6 2019-02-21 Pear
7 2019-03-01 Pear
9 2019-03-04 Pear
Upvotes: 1