Reputation: 41
I have a dataframe with repeating index : disease
& a count
column.
I want to split this count
column into multiple columns , for all repetitions of the index.
For e.g. There should be one index cancer
followed by each count in a separate column & so on.
How do I go about this task using Pandas functions.
Tried Pivot & unstack
, but getting errors.
count
disease
cancer 43.904762
cardiovascular 7.433333
stroke 17.642857
depression 45.623810
rehab 18.890476
vaccine 32.476190
diarrhea 16.280952
obesity 34.261905
diabetes 38.042857
cancer 62.909524
cardiovascular 18.671429
stroke 44.619048
depression 12.242857
rehab 23.823810
vaccine 29.342857
diarrhea 24.771429
obesity 21.519048
diabetes 23.238095
cancer 49.214286
cardiovascular 21.557143
Desired outcome
count count_1 count_2 count_3
disease
cancer 43.904762 62.909524 49.214286
cardiovascular 7.433333 18.671429 21.557143
depression 45.623810 12.242857 NaN
diabetes 38.042857 23.238095 NaN
diarrhea 16.280952 24.771429 NaN
obesity 34.261905 21.519048 NaN
rehab 18.890476 23.823810 NaN
stroke 17.642857 44.619048 NaN
vaccine 32.476190 29.342857 NaN
Upvotes: 1
Views: 412
Reputation: 71689
Use, DataFrame.melt
to melt the dateframe with id_var disease
and use var name count
, then add the sequential counter to the column count
by grouping the dataframe on disease
column and using Groupby.cumcount
, finally use DataFrame.pivot
to pivot the dataframe on disease
as index and count
as column:
# STEP 1
df1 = df.reset_index().melt('disease', var_name='count')
# STEP 2
df1['count'] += '_' + df1.groupby('disease').cumcount().add(1).astype(str)
# STEP 3
df1 = df1.pivot('disease', 'count').droplevel(0, 1)
Steps:
# STEP 1
disease count value
0 cancer count 43.904762
1 cardiovascular count 7.433333
2 stroke count 17.642857
...
...
17 diabetes count 23.238095
18 cancer count 49.214286
19 cardiovascular count 21.557143
# STEP 2
disease count value
0 cancer count_1 43.904762
1 cardiovascular count_1 7.433333
2 stroke count_1 17.642857
...
...
17 diabetes count_2 23.238095
18 cancer count_3 49.214286
19 cardiovascular count_3 21.557143
# STEP 3 (result)
count count_1 count_2 count_3
disease
cancer 43.904762 62.909524 49.214286
cardiovascular 7.433333 18.671429 21.557143
depression 45.623810 12.242857 NaN
diabetes 38.042857 23.238095 NaN
diarrhea 16.280952 24.771429 NaN
obesity 34.261905 21.519048 NaN
rehab 18.890476 23.823810 NaN
stroke 17.642857 44.619048 NaN
vaccine 32.476190 29.342857 NaN
Upvotes: 2