Afr0
Afr0

Reputation: 41

Map unique values by ID to separate columns in Pandas

Suppose I have the following DataFrame:

ID Diagnosis Test Result
1 Cancer Positive
1 TB Negative
1 Lupus Indeterminate
2 Cancer Negative
2 TB Negative
2 Myopia Negative
2 Hypertension Negative

Which is in this wide format marked by ID. I want to truncate the rows to a single entry by creating more columns on the unique values by each ID. I would like my DataFrame to look like:

ID Diagnosis_1 Diagnosis_2 Diagnosis_3 Diagnosis_4 Test Result_1 Test Result_2 Test Result_3 Test Result_4
1 Cancer TB Lupus Positive Negative Indeterminate
2 Cancer TB Myopia Hypertension Negative Negative Negative Negative

A couple of things to note:

Upvotes: 0

Views: 479

Answers (1)

Asish M.
Asish M.

Reputation: 2647

In [97]: tmp = df.assign(result_num=df.groupby('ID').cumcount()+1).set_index(['ID', 'result_num']).unstack()
    ...: tmp.columns = [f'{a}_{b}' for a,b in tmp.columns]
    ...: tmp
Out[97]:
   Diagnosis_1 Diagnosis_2 Diagnosis_3   Diagnosis_4 Test Result_1 Test Result_2  Test Result_3 Test Result_4
ID
1       Cancer          TB       Lupus           NaN      Positive      Negative  Indeterminate           NaN
2       Cancer          TB      Myopia  Hypertension      Negative      Negative       Negative      Negative
  1. assign a group serial number using df.groupby('ID').cumcount() + 1
  2. set index to both ID and the serial number
  3. then unstack the serial number column to convert it to columns
  4. the last step flattens the multi-indexed column labels
  5. you can add a .fillna('') to get rid of the NaNs

Upvotes: 1

Related Questions