Fromanothersea
Fromanothersea

Reputation: 15

Create new dataframe based on values of other columns whilst grouping by ID

So I have the following (sample) dataframe:

In [1]: 
import numpy as np
import panda as pd
df = pd.DataFrame([[a,a,a,a,a,a,b,b,c,d],[Ankle Circles, Ankle Pumps, Static Glutes, Static Quads, Static Quads,Breathing Exercises, Heel Slides, Standing Hip, Ankle Circles, Ankle Pumps], [0,10,0,0,0,10,20,30,10,0]], columns = ['ID', 'exercise_title', 'exercise_duration'])

In [2]: df
Out[2]:
        ID             exercise_title             exercise_duration
        a               Ankle Circles                      0.0
        a               Ankle Pumps                       10.0
        a               Static Glutes                      0.0
        a               Static Quads                       0.0
        a               Static Quads                       0.0
        a               Breathing Exercise                10.0
        b               Heel Slides                       20.0
        b               Standing Hip                      30.0
        c               Ankle Circles                     10.0
        d               Ankle Pumps                        0.0

Above, is a shortened version of the dataset. There are 90 different exercise titles and I wish to create a new dataframe of which will group the ID's together and have 2 columns for each exercise title:

1 - One with the sum of the time taken for each exercise 2 - One with a Yes/No answer of whether that patient carried out that exercise or not.

So I want it to look like this, but bigger as there are actually 90 different exercise titles:

In [3]:
Out[3]:
        ID           Ankle_Circles_duration  Ankle_Circles   Ankle_Pumps_duration  Ankle_Pumps   Static_Glutes_duration   Static_Glutes  Static_quads_duration  Static_quads  Breathing_Exercises_duration   Breathing_Exercises   Heel_Slides_duration   Heel_Slides   Standing_Hip_duration   Standing_Hip 
        a                  0.0                   No                  10.0              Yes                0.0                      No             0.0               No                    0.0                           No                    0.0              No              0.0                     No
        b                  0.0                   No                   0.0               No                0.0                      No             0.0               No                   10.0                          Yes                   20.0             Yes              0.0                     No        
        c                 10.0                  Yes                   0.0               No                0.0                      No             0.0               No                    0.0                           No                    0.0              No              0.0                     No  
        d                  0.0                   No                   0.0               No                0.0                      No             0.0               No                    0.0                           No                    0.0              No              0.0                     No 

I have tried the following coding but this is only for the first two columns and I can't do this coding for all the 90 exercise title types as it would take too long, so I was wondering if there is a more efficient and quicker way to do this?

ankle_circles_duration = df[df['exercise_title'] == 'Ankle circles'].groupby('ID').sum()['exercise_duration']
exercise_new['ankle_circles_duration'] = exercise_new['ankle_circles_duration'].fillna(0)
exercise_new.loc[exercise_new['ankle_circles_duration'] >0, 'ankle_circles'] = 'Yes'
exercise_new.loc[exercise_new['ankle_circles_duration'] == 0, 'ankle_circles'] = 'No'

Thank you.

enter image description here

enter image description here

enter image description here

enter image description here

Upvotes: 1

Views: 54

Answers (1)

MrNobody33
MrNobody33

Reputation: 6483

You can try something like this, using pivot, and then np.where:

df=df.drop_duplicates()
df=df.pivot(index='ID', columns='exercise_title', values='exercise_duration').fillna(0)
newdf=pd.DataFrame(index=df.index)

for col in df.columns:
    newdf[col+'_duration']=df[col]
    newdf[col]=np.where(df[col].eq(0),'No','Yes')

print(newdf)

Output:

df with pivot:
exercise_title  Ankle Circles  Ankle Pumps  Breathing Exercise  Heel Slides  Standing Hip  Static Glutes  Static Quads
ID
a                         0.0         10.0                10.0          0.0           0.0            0.0           0.0
b                         0.0          0.0                 0.0         20.0          30.0            0.0           0.0
c                        10.0          0.0                 0.0          0.0           0.0            0.0           0.0
d                         0.0          0.0                 0.0          0.0           0.0            0.0           0.0

newdf:

    Ankle_Circles_duration Ankle_Circles  Ankle_Pumps_duration Ankle_Pumps  ...  Static_Glutes_duration Static_Glutes  Static_Quads_duration Static_Quads
ID                                                                          ...
a                      0.0            No                  10.0         Yes  ...                     0.0            No                    0.0           No
b                      0.0            No                   0.0          No  ...                     0.0            No                    0.0           No
c                     10.0           Yes                   0.0          No  ...                     0.0            No                    0.0           No
d                      0.0            No                   0.0          No  ...                     0.0            No                    0.0           No

Upvotes: 2

Related Questions