Reputation: 15
So I have the following (sample) dataframe:
In [1]:
import numpy as np
import panda as pd
df = pd.DataFrame([[a,a,a,a,a,a,b,b,c,d],[Ankle Circles, Ankle Pumps, Static Glutes, Static Quads, Static Quads,Breathing Exercises, Heel Slides, Standing Hip, Ankle Circles, Ankle Pumps], [0,10,0,0,0,10,20,30,10,0]], columns = ['ID', 'exercise_title', 'exercise_duration'])
In [2]: df
Out[2]:
ID exercise_title exercise_duration
a Ankle Circles 0.0
a Ankle Pumps 10.0
a Static Glutes 0.0
a Static Quads 0.0
a Static Quads 0.0
a Breathing Exercise 10.0
b Heel Slides 20.0
b Standing Hip 30.0
c Ankle Circles 10.0
d Ankle Pumps 0.0
Above, is a shortened version of the dataset. There are 90 different exercise titles and I wish to create a new dataframe of which will group the ID's together and have 2 columns for each exercise title:
1 - One with the sum of the time taken for each exercise 2 - One with a Yes/No answer of whether that patient carried out that exercise or not.
So I want it to look like this, but bigger as there are actually 90 different exercise titles:
In [3]:
Out[3]:
ID Ankle_Circles_duration Ankle_Circles Ankle_Pumps_duration Ankle_Pumps Static_Glutes_duration Static_Glutes Static_quads_duration Static_quads Breathing_Exercises_duration Breathing_Exercises Heel_Slides_duration Heel_Slides Standing_Hip_duration Standing_Hip
a 0.0 No 10.0 Yes 0.0 No 0.0 No 0.0 No 0.0 No 0.0 No
b 0.0 No 0.0 No 0.0 No 0.0 No 10.0 Yes 20.0 Yes 0.0 No
c 10.0 Yes 0.0 No 0.0 No 0.0 No 0.0 No 0.0 No 0.0 No
d 0.0 No 0.0 No 0.0 No 0.0 No 0.0 No 0.0 No 0.0 No
I have tried the following coding but this is only for the first two columns and I can't do this coding for all the 90 exercise title types as it would take too long, so I was wondering if there is a more efficient and quicker way to do this?
ankle_circles_duration = df[df['exercise_title'] == 'Ankle circles'].groupby('ID').sum()['exercise_duration']
exercise_new['ankle_circles_duration'] = exercise_new['ankle_circles_duration'].fillna(0)
exercise_new.loc[exercise_new['ankle_circles_duration'] >0, 'ankle_circles'] = 'Yes'
exercise_new.loc[exercise_new['ankle_circles_duration'] == 0, 'ankle_circles'] = 'No'
Thank you.
Upvotes: 1
Views: 54
Reputation: 6483
You can try something like this, using pivot
, and then np.where
:
df=df.drop_duplicates()
df=df.pivot(index='ID', columns='exercise_title', values='exercise_duration').fillna(0)
newdf=pd.DataFrame(index=df.index)
for col in df.columns:
newdf[col+'_duration']=df[col]
newdf[col]=np.where(df[col].eq(0),'No','Yes')
print(newdf)
Output:
df with pivot:
exercise_title Ankle Circles Ankle Pumps Breathing Exercise Heel Slides Standing Hip Static Glutes Static Quads
ID
a 0.0 10.0 10.0 0.0 0.0 0.0 0.0
b 0.0 0.0 0.0 20.0 30.0 0.0 0.0
c 10.0 0.0 0.0 0.0 0.0 0.0 0.0
d 0.0 0.0 0.0 0.0 0.0 0.0 0.0
newdf:
Ankle_Circles_duration Ankle_Circles Ankle_Pumps_duration Ankle_Pumps ... Static_Glutes_duration Static_Glutes Static_Quads_duration Static_Quads
ID ...
a 0.0 No 10.0 Yes ... 0.0 No 0.0 No
b 0.0 No 0.0 No ... 0.0 No 0.0 No
c 10.0 Yes 0.0 No ... 0.0 No 0.0 No
d 0.0 No 0.0 No ... 0.0 No 0.0 No
Upvotes: 2