nay
nay

Reputation: 75

pandas set value if dates are the same in two diffrent df according to dates range

I have df: while disease is binary column ( 0 or 1)

diagnosis_date    id  disease
2013-05-03         1     0
2013-05-08         1     0
2013-06-08         1     1
2013-01-01         2     0 
.....

and I have range of dates- 2013-01-01 until 2013-12-31:

date_index=pd.date_range(start='1/1/2013', end='31/12/2013')
dates=pd.DataFrame(date_index,columns=['date'])

I want for each id in df, to set the date range as date_index, and if the date is the same as diagnosis date, to set the value like it in the disease column, otherwise the value will be set to zero. the desire df:

date    id    disease
01-01    1       0
02-01     1      0
03-01     1      0
...
05-03     1      0
05-04     1      0
...
06-08    1      1
 ....
12-31     1      0
01-01     2      1
01-02     2      0 
...  

Thanks

Upvotes: 0

Views: 385

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Try this..

df = pd.read_clipboard() #read in your dataframe from clipboard

dfp = df.pivot('diagnosis_date', 'id', 'disease') #reshape dataframe
dfp.index = pd.to_datetime(dfp.index) #cast index to datetime

dfp.reindex(pd.date_range('1/1/2013','12/31/2013'))\  #add rows with reindex and pd.date_range
   .rename_axis('date').fillna(0)\  #fill with zeroes
   .stack().rename('disease')\  #Reshape back to original shape
   .reset_index()\
   .sort_values(['id', 'date'])  #and sort

Output:

          date  id  disease
0   2013-01-01   1      0.0
2   2013-01-02   1      0.0
4   2013-01-03   1      0.0
6   2013-01-04   1      0.0
8   2013-01-05   1      0.0
..         ...  ..      ...
721 2013-12-27   2      0.0
723 2013-12-28   2      0.0
725 2013-12-29   2      0.0
727 2013-12-30   2      0.0
729 2013-12-31   2      0.0

Upvotes: 1

sharathnatraj
sharathnatraj

Reputation: 1614

Here you go:

date_index=pd.date_range(start='1/1/2013', end='31/12/2013')
dates = pd.DataFrame()
for i in df.id.unique():
    dates=pd.concat([dates,pd.DataFrame({'date':date_index, 'id' : np.full(len(date_index),i)})])
df.diagnosis_date = pd.to_datetime(df['diagnosis_date'])
df1 = pd.merge(dates,df, left_on=['id','date'], right_on=['id','diagnosis_date'], how='left')[['date','id','disease']].fillna(0)
df1['disease'] = df1.disease.astype(int)

Tested and prints correctly.

Upvotes: 0

Related Questions