Sang
Sang

Reputation: 75

How to cast (long to wide) dataframe based on some conditions

How can I convert this:

patient_id test    test_value      date_taken
11964    HBA1C         8.60        2017-06-14
11964    Glucose     231.00        2017-05-01
11964    Glucose     202.00        2017-07-01
11964    Glucose     194.00        2017-09-02
11964    Creatinine    1.10        2017-05-01
11964    Creatinine    1.28        2017-08-14

to this?

patient_id  hba1c_earliest hba1c_latest hba1c_change glucose_earliest glucose_latest/
    11964      8.60           8.60          0.0000        231.0           194.0   
glucose_change creatinine_earliest creatinine_latest creatinine_change
     -0.1602         1.10               1.28             0.1636

For the extended dataframe:

.*_earliest columns should include that lab result with the earliest date. .*_latest columns should include that lab result with the latest date. .*_change columns should hold the relative change (variation), (Latest - Earliest) / Earliest.

Upvotes: 1

Views: 55

Answers (1)

jezrael
jezrael

Reputation: 863156

Use:

print (df.dtypes)
patient_id             int64 <- not necessary
test                  object <- not necessary
test_value           float64 <- necessary
date_taken    datetime64[ns] <- necessary
dtype: object

df = (df.sort_values(['patient_id','test','date_taken'])
       .groupby(['patient_id','test'])['test_value']
       .agg([('earliest','first'),('latest','last')])
       .assign(change = lambda x: (x['latest'] - x['earliest'])/ x['earliest'])
       .unstack()
       .swaplevel(0,1, axis=1)
       .reindex(columns=df['test'].unique(), level=0)
       )
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
   patient_id  HBA1C_earliest  HBA1C_latest  HBA1C_change  Glucose_earliest  \
0       11964             8.6           8.6           0.0             231.0   

   Glucose_latest  Glucose_change  Creatinine_earliest  Creatinine_latest  \
0           194.0       -0.160173                  1.1               1.28   

   Creatinine_change  
0           0.163636  

Explanation:

  1. First sort_values by multiple columns
  2. Aggregate by agg with first and last for earliest and latest columns.
  3. Create new column by assign
  4. Rehape by unstack
  5. Swap levels in MulriIndex in columns by swaplevel
  6. Then reindex for same order like in original column
  7. Flatten MultiIndex by map with join in columns
  8. Last reset_index for column from index.

Upvotes: 2

Related Questions