Reputation: 587
I have a data frame that has vertical jump power for a group of athletes over an 8 week tim period. One athlete only completed seven weeks. I want to calculate a new variable called "percent.change" which calculates the percent difference from the first week for each of the weeks through the study period. I have been trying to use dplyr to figure this out but I'm stuck. I'm wondering if anyone out there has a straightforward solution.
The data frame is called weeklyPower. A sample of weeklyPower is below:
athlt week power
E 1 25.20015
E 2 25.54569
E 3 24.52463
E 4 24.88044
E 5 25.11421
E 6 25.86154
E 7 26.08613
E 8 25.90775
K 1 29.74277
K 2 28.80131
K 3 28.96818
K 4 29.62439
K 5 29.98119
K 6 29.11570
K 7 29.96380
T 1 25.02413
T 2 23.75867
T 3 25.25716
T 4 24.73285
T 5 27.02891
T 6 25.60140
T 7 25.64665
T 8 24.38937
Thank you very much for any thoughts.
Matt
Upvotes: 5
Views: 294
Reputation: 185
Sorry for my belated answer but I found this useful especially using a reference period. Hope this helps.
df %>% group_by(athlt) %>%
arrange(athlt, week) %>% # first arrange data to get series and then count power per athlete and week
mutate(wk.growth = round(power/lag(power,1)*100, 1), # creat weekly delta based on the week before
ref.growth = round((power / first(power)*100), 1), # creat weekly delta based on the reference week
n.week = length(athlt)) %>%
replace_na(list(wk.growth=100)) %>% # replace NA with 100 to calculate changes
mutate(d.growth = wk.growth - first(wk.growth), # add change rates
d.ref.growth = ref.growth - first(ref.growth) # add change rates with reference week
)
Upvotes: 0
Reputation: 21621
Using dplyr
you could do:
df %>%
group_by(athlt) %>%
arrange(week) %>%
mutate(cp = power / first(power) * 100)
Which gives:
#Source: local data frame [23 x 4]
#Groups: athlt
# athlt week power cp
#1 E 1 25.20015 100.00000
#2 E 2 25.54569 101.37118
#3 E 3 24.52463 97.31938
#4 E 4 24.88044 98.73132
#5 E 5 25.11421 99.65897
#6 E 6 25.86154 102.62455
#7 E 7 26.08613 103.51577
#8 E 8 25.90775 102.80792
#9 K 1 29.74277 100.00000
#10 K 2 28.80131 96.83466
Or another option:
df %>%
group_by(athlt) %>%
mutate(cp = power / power[which.min(week)] * 100)
Upvotes: 7
Reputation: 887223
Here is a similar option with data.table
, where we convert the data.frame
to data.table
(setDT(df)
), for each "athlt" group, we order
by "week" and create a column 'cp' by the fast assignment (:=
) by reference.
library(data.table)
setDT(df)[order(week), cp := power/power[1L]*100 ,by=athlt]
# athlt week power cp
#1: E 1 25.20015 100.00000
#2: E 2 25.54569 101.37118
#3: E 3 24.52463 97.31938
#4: E 4 24.88044 98.73132
#5: E 5 25.11421 99.65897
#6: E 6 25.86154 102.62455
We can also use setorder
, which in general is memory efficient as it reorders the dataset by reference. But, in this case (as @Arun mentioned in the comments), the above method would be also very efficient as order
computes only the indices instead of reordering the entire dataset.
setorder(setDT(df),athlt,week)[, cp:= power/power[1L] *100, athlt][]
Or if the "week" is numeric, you may use which.min
without using order
setDT(df)[, cp := power/power[which.min(week)]*100, by=athlt]
Upvotes: 4
Reputation: 4335
> Data$percent_change <- unlist(
+ tapply(Data$power, Data$athlt, function(x) c(NA, 100*x[-1]/x[1]) )
+ )
> Data
athlt week power percent_change
1 E 1 25.20015 NA
2 E 2 25.54569 101.37118
3 E 3 24.52463 97.31938
4 E 4 24.88044 98.73132
5 E 5 25.11421 99.65897
6 E 6 25.86154 102.62455
7 E 7 26.08613 103.51577
8 E 8 25.90775 102.80792
9 K 1 29.74277 NA
10 K 2 28.80131 96.83466
11 K 3 28.96818 97.39570
12 K 4 29.62439 99.60199
13 K 5 29.98119 100.80161
14 K 6 29.11570 97.89169
15 K 7 29.96380 100.74314
16 T 1 25.02413 NA
17 T 2 23.75867 94.94304
18 T 3 25.25716 100.93122
19 T 4 24.73285 98.83600
20 T 5 27.02891 108.01139
21 T 6 25.60140 102.30685
22 T 7 25.64665 102.48768
23 T 8 24.38937 97.46341
Upvotes: 1