Map unique values by ID to separate columns in Pandas

Question

Suppose I have the following DataFrame:

ID	Diagnosis	Test Result
1	Cancer	Positive
1	TB	Negative
1	Lupus	Indeterminate
2	Cancer	Negative
2	TB	Negative
2	Myopia	Negative
2	Hypertension	Negative

Which is in this wide format marked by ID. I want to truncate the rows to a single entry by creating more columns on the unique values by each ID. I would like my DataFrame to look like:

ID	Diagnosis_1	Diagnosis_2	Diagnosis_3	Diagnosis_4	Test Result_1	Test Result_2	Test Result_3	Test Result_4
1	Cancer	TB	Lupus		Positive	Negative	Indeterminate
2	Cancer	TB	Myopia	Hypertension	Negative	Negative	Negative	Negative

A couple of things to note:

It is ok to have extra columns for IDs where the entry does not exist (see ID 1)
The number of columns is decided by the ID with the most entries (but this may not be the same ID for each column)
Does Pandas have a clean way to do this with some methods?

Asish M. · Accepted Answer

In [97]: tmp = df.assign(result_num=df.groupby('ID').cumcount()+1).set_index(['ID', 'result_num']).unstack()
    ...: tmp.columns = [f'{a}_{b}' for a,b in tmp.columns]
    ...: tmp
Out[97]:
   Diagnosis_1 Diagnosis_2 Diagnosis_3   Diagnosis_4 Test Result_1 Test Result_2  Test Result_3 Test Result_4
ID
1       Cancer          TB       Lupus           NaN      Positive      Negative  Indeterminate           NaN
2       Cancer          TB      Myopia  Hypertension      Negative      Negative       Negative      Negative

assign a group serial number using df.groupby('ID').cumcount() + 1
set index to both ID and the serial number
then unstack the serial number column to convert it to columns
the last step flattens the multi-indexed column labels
you can add a .fillna('') to get rid of the NaNs

Map unique values by ID to separate columns in Pandas

Answers (1)

Related Questions