user3420834
user3420834

Reputation: 57

Python Pandas Dataframe manipulation to form larger dataframe

Im looking to turn

pd.DataFrame({'Id':['A','B'],
              'beginDate':['2017-01-01','2017-01-03'],
              'endDate':['2017-01-02','2017-01-05']})

    Id  beginDate   endDate
0   A   2017-01-01  2017-01-02
1   B   2017-01-02  2017-01-05

Into

pd.DataFrame({'Id':['A','A','A','B','B','B','B'],
         'Date':['2017-01-01','2017-01-02','2017-01-03','2017-01-02',
                 '2017-01-03','2017-01-04','2017-01-05']})

    Date       Id
0   2017-01-01  A
1   2017-01-02  A
2   2017-01-03  A
3   2017-01-02  B
4   2017-01-03  B
5   2017-01-04  B
6   2017-01-05  B

The concept is to fill the second dataframe with all values between the begin and end date.

I can use a for loop successfully and append a new dataframe for each unique 'Id' but this is very slow.

The for loop uses the below by looping over each row.

def dfChange(uniqueId, startDate, endDate):

allDates = pd.date_range(startDate, endDate)
df = pd.DataFrame({'Date':allDates})
df['Id'] = uniqueId
return df

Is it possible to do this without using a for loop?

Upvotes: 2

Views: 42

Answers (1)

jezrael
jezrael

Reputation: 863166

You can use DataFrame.melt with set_index for reshape first.

Then convert index to to_datetime.

Last DataFrame.groupby with DataFrameGroupBy.resample, call ffill and reset_index.

All data are in index, so select no columns by [[]].

df = pd.DataFrame({'Id':['A','B'],
              'beginDate':['2017-01-01','2017-01-02'],
              'endDate':['2017-01-03','2017-01-05']})
print (df)
  Id   beginDate     endDate
0  A  2017-01-01  2017-01-03
1  B  2017-01-02  2017-01-05

df = df.melt('Id', value_name='Date', var_name='a').set_index('Date')
df.index = pd.to_datetime(df.index)
df=df.groupby('Id').resample('B')[[]].ffill().reset_index()
print (df)
  Id       Date
0  A 2017-01-01
1  A 2017-01-02
2  A 2017-01-03
3  B 2017-01-02
4  B 2017-01-03
5  B 2017-01-04
6  B 2017-01-05

Upvotes: 4

Related Questions