Reputation: 75
I have df: while disease is binary column ( 0 or 1)
diagnosis_date id disease
2013-05-03 1 0
2013-05-08 1 0
2013-06-08 1 1
2013-01-01 2 0
.....
and I have range of dates- 2013-01-01 until 2013-12-31:
date_index=pd.date_range(start='1/1/2013', end='31/12/2013')
dates=pd.DataFrame(date_index,columns=['date'])
I want for each id in df, to set the date range as date_index, and if the date is the same as diagnosis date, to set the value like it in the disease column, otherwise the value will be set to zero. the desire df:
date id disease
01-01 1 0
02-01 1 0
03-01 1 0
...
05-03 1 0
05-04 1 0
...
06-08 1 1
....
12-31 1 0
01-01 2 1
01-02 2 0
...
Thanks
Upvotes: 0
Views: 385
Reputation: 153460
Try this..
df = pd.read_clipboard() #read in your dataframe from clipboard
dfp = df.pivot('diagnosis_date', 'id', 'disease') #reshape dataframe
dfp.index = pd.to_datetime(dfp.index) #cast index to datetime
dfp.reindex(pd.date_range('1/1/2013','12/31/2013'))\ #add rows with reindex and pd.date_range
.rename_axis('date').fillna(0)\ #fill with zeroes
.stack().rename('disease')\ #Reshape back to original shape
.reset_index()\
.sort_values(['id', 'date']) #and sort
Output:
date id disease
0 2013-01-01 1 0.0
2 2013-01-02 1 0.0
4 2013-01-03 1 0.0
6 2013-01-04 1 0.0
8 2013-01-05 1 0.0
.. ... .. ...
721 2013-12-27 2 0.0
723 2013-12-28 2 0.0
725 2013-12-29 2 0.0
727 2013-12-30 2 0.0
729 2013-12-31 2 0.0
Upvotes: 1
Reputation: 1614
Here you go:
date_index=pd.date_range(start='1/1/2013', end='31/12/2013')
dates = pd.DataFrame()
for i in df.id.unique():
dates=pd.concat([dates,pd.DataFrame({'date':date_index, 'id' : np.full(len(date_index),i)})])
df.diagnosis_date = pd.to_datetime(df['diagnosis_date'])
df1 = pd.merge(dates,df, left_on=['id','date'], right_on=['id','diagnosis_date'], how='left')[['date','id','disease']].fillna(0)
df1['disease'] = df1.disease.astype(int)
Tested and prints correctly.
Upvotes: 0