Matt Jordan
Matt Jordan

Reputation: 587

calculate a percentage from the start of an intervention in R

I have a data frame that has vertical jump power for a group of athletes over an 8 week tim period. One athlete only completed seven weeks. I want to calculate a new variable called "percent.change" which calculates the percent difference from the first week for each of the weeks through the study period. I have been trying to use dplyr to figure this out but I'm stuck. I'm wondering if anyone out there has a straightforward solution.

The data frame is called weeklyPower. A sample of weeklyPower is below:

athlt week     power
E      1       25.20015
E      2       25.54569
E      3       24.52463
E      4       24.88044
E      5       25.11421
E      6       25.86154
E      7       26.08613
E      8       25.90775
K      1       29.74277
K      2       28.80131
K      3       28.96818
K      4       29.62439
K      5       29.98119
K      6       29.11570
K      7       29.96380
T      1       25.02413
T      2       23.75867
T      3       25.25716
T      4       24.73285
T      5       27.02891
T      6       25.60140
T      7       25.64665
T      8       24.38937 

Thank you very much for any thoughts.

Matt

Upvotes: 5

Views: 294

Answers (4)

chrischi
chrischi

Reputation: 185

Sorry for my belated answer but I found this useful especially using a reference period. Hope this helps.

df %>% group_by(athlt) %>%
      arrange(athlt, week) %>% # first arrange data to get series and then count power per athlete and week
         mutate(wk.growth = round(power/lag(power,1)*100, 1),  # creat weekly delta based on the week before
               ref.growth = round((power / first(power)*100), 1),    # creat weekly delta based on the reference week
               n.week = length(athlt))   %>% 
                  replace_na(list(wk.growth=100)) %>%          # replace NA with 100 to calculate changes
                    mutate(d.growth =  wk.growth - first(wk.growth), # add change rates
                           d.ref.growth = ref.growth - first(ref.growth) # add change rates with reference week
                                                                     )

Upvotes: 0

Steven Beaupré
Steven Beaupré

Reputation: 21621

Using dplyr you could do:

df %>% 
  group_by(athlt) %>% 
  arrange(week) %>% 
  mutate(cp = power / first(power) * 100)

Which gives:

#Source: local data frame [23 x 4]
#Groups: athlt

#   athlt week    power        cp
#1      E    1 25.20015 100.00000
#2      E    2 25.54569 101.37118
#3      E    3 24.52463  97.31938
#4      E    4 24.88044  98.73132
#5      E    5 25.11421  99.65897
#6      E    6 25.86154 102.62455
#7      E    7 26.08613 103.51577
#8      E    8 25.90775 102.80792
#9      K    1 29.74277 100.00000
#10     K    2 28.80131  96.83466

Or another option:

df %>% 
  group_by(athlt) %>% 
  mutate(cp = power / power[which.min(week)] * 100)

Upvotes: 7

akrun
akrun

Reputation: 887223

Here is a similar option with data.table, where we convert the data.frame to data.table (setDT(df)), for each "athlt" group, we order by "week" and create a column 'cp' by the fast assignment (:=) by reference.

library(data.table)
setDT(df)[order(week), cp := power/power[1L]*100 ,by=athlt]
#   athlt week    power        cp
#1:     E    1 25.20015 100.00000
#2:     E    2 25.54569 101.37118
#3:     E    3 24.52463  97.31938
#4:     E    4 24.88044  98.73132
#5:     E    5 25.11421  99.65897
#6:     E    6 25.86154 102.62455

We can also use setorder, which in general is memory efficient as it reorders the dataset by reference. But, in this case (as @Arun mentioned in the comments), the above method would be also very efficient as order computes only the indices instead of reordering the entire dataset.

 setorder(setDT(df),athlt,week)[, cp:= power/power[1L] *100, athlt][]

Or if the "week" is numeric, you may use which.min without using order

 setDT(df)[, cp := power/power[which.min(week)]*100, by=athlt]

Upvotes: 4

Prasanna Nandakumar
Prasanna Nandakumar

Reputation: 4335

> Data$percent_change <-  unlist(
+     tapply(Data$power, Data$athlt, function(x) c(NA, 100*x[-1]/x[1]) )
+ )
> Data
   athlt week    power percent_change
1      E    1 25.20015             NA
2      E    2 25.54569      101.37118
3      E    3 24.52463       97.31938
4      E    4 24.88044       98.73132
5      E    5 25.11421       99.65897
6      E    6 25.86154      102.62455
7      E    7 26.08613      103.51577
8      E    8 25.90775      102.80792
9      K    1 29.74277             NA
10     K    2 28.80131       96.83466
11     K    3 28.96818       97.39570
12     K    4 29.62439       99.60199
13     K    5 29.98119      100.80161
14     K    6 29.11570       97.89169
15     K    7 29.96380      100.74314
16     T    1 25.02413             NA
17     T    2 23.75867       94.94304
18     T    3 25.25716      100.93122
19     T    4 24.73285       98.83600
20     T    5 27.02891      108.01139
21     T    6 25.60140      102.30685
22     T    7 25.64665      102.48768
23     T    8 24.38937       97.46341

Upvotes: 1

Related Questions