Reputation: 57
Im looking to turn
pd.DataFrame({'Id':['A','B'],
'beginDate':['2017-01-01','2017-01-03'],
'endDate':['2017-01-02','2017-01-05']})
Id beginDate endDate
0 A 2017-01-01 2017-01-02
1 B 2017-01-02 2017-01-05
Into
pd.DataFrame({'Id':['A','A','A','B','B','B','B'],
'Date':['2017-01-01','2017-01-02','2017-01-03','2017-01-02',
'2017-01-03','2017-01-04','2017-01-05']})
Date Id
0 2017-01-01 A
1 2017-01-02 A
2 2017-01-03 A
3 2017-01-02 B
4 2017-01-03 B
5 2017-01-04 B
6 2017-01-05 B
The concept is to fill the second dataframe with all values between the begin and end date.
I can use a for loop successfully and append a new dataframe for each unique 'Id' but this is very slow.
The for loop uses the below by looping over each row.
def dfChange(uniqueId, startDate, endDate):
allDates = pd.date_range(startDate, endDate)
df = pd.DataFrame({'Date':allDates})
df['Id'] = uniqueId
return df
Is it possible to do this without using a for loop?
Upvotes: 2
Views: 42
Reputation: 863166
You can use DataFrame.melt
with set_index
for reshape first.
Then convert index to to_datetime
.
Last DataFrame.groupby
with DataFrameGroupBy.resample
, call ffill
and reset_index
.
All data are in index
, so select no columns by [[]]
.
df = pd.DataFrame({'Id':['A','B'],
'beginDate':['2017-01-01','2017-01-02'],
'endDate':['2017-01-03','2017-01-05']})
print (df)
Id beginDate endDate
0 A 2017-01-01 2017-01-03
1 B 2017-01-02 2017-01-05
df = df.melt('Id', value_name='Date', var_name='a').set_index('Date')
df.index = pd.to_datetime(df.index)
df=df.groupby('Id').resample('B')[[]].ffill().reset_index()
print (df)
Id Date
0 A 2017-01-01
1 A 2017-01-02
2 A 2017-01-03
3 B 2017-01-02
4 B 2017-01-03
5 B 2017-01-04
6 B 2017-01-05
Upvotes: 4