Kedar_dg
Kedar_dg

Reputation: 41

Pandas : Split a column into n columns based on repeating index

I have a dataframe with repeating index : disease & a count column. I want to split this count column into multiple columns , for all repetitions of the index.

For e.g. There should be one index cancer followed by each count in a separate column & so on. How do I go about this task using Pandas functions. Tried Pivot & unstack , but getting errors.

                    count
disease                  
cancer          43.904762
cardiovascular   7.433333
stroke          17.642857
depression      45.623810
rehab           18.890476
vaccine         32.476190
diarrhea        16.280952
obesity         34.261905
diabetes        38.042857
cancer          62.909524
cardiovascular  18.671429
stroke          44.619048
depression      12.242857
rehab           23.823810
vaccine         29.342857
diarrhea        24.771429
obesity         21.519048
diabetes        23.238095
cancer          49.214286
cardiovascular  21.557143

Desired outcome

count             count_1    count_2    count_3
disease                                        
cancer          43.904762  62.909524  49.214286
cardiovascular   7.433333  18.671429  21.557143
depression      45.623810  12.242857        NaN
diabetes        38.042857  23.238095        NaN
diarrhea        16.280952  24.771429        NaN
obesity         34.261905  21.519048        NaN
rehab           18.890476  23.823810        NaN
stroke          17.642857  44.619048        NaN
vaccine         32.476190  29.342857        NaN

Upvotes: 1

Views: 412

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use, DataFrame.melt to melt the dateframe with id_var disease and use var name count, then add the sequential counter to the column count by grouping the dataframe on disease column and using Groupby.cumcount, finally use DataFrame.pivot to pivot the dataframe on disease as index and count as column:

# STEP 1
df1 = df.reset_index().melt('disease', var_name='count')

# STEP 2
df1['count'] += '_' + df1.groupby('disease').cumcount().add(1).astype(str)

# STEP 3
df1 = df1.pivot('disease', 'count').droplevel(0, 1)

Steps:

# STEP 1
           disease  count      value
0           cancer  count  43.904762
1   cardiovascular  count   7.433333
2           stroke  count  17.642857
...
...
17        diabetes  count  23.238095
18          cancer  count  49.214286
19  cardiovascular  count  21.557143

# STEP 2
           disease    count      value
0           cancer  count_1  43.904762
1   cardiovascular  count_1   7.433333
2           stroke  count_1  17.642857
...
...
17        diabetes  count_2  23.238095
18          cancer  count_3  49.214286
19  cardiovascular  count_3  21.557143

# STEP 3 (result)
count             count_1    count_2    count_3
disease                                        
cancer          43.904762  62.909524  49.214286
cardiovascular   7.433333  18.671429  21.557143
depression      45.623810  12.242857        NaN
diabetes        38.042857  23.238095        NaN
diarrhea        16.280952  24.771429        NaN
obesity         34.261905  21.519048        NaN
rehab           18.890476  23.823810        NaN
stroke          17.642857  44.619048        NaN
vaccine         32.476190  29.342857        NaN

Upvotes: 2

Related Questions