Cam
Cam

Reputation: 87

Percentage change in grouped data: calculate against first value of group

I'm trying to get the percentage change between the first value (of one variable) in a group and every other value (of the same variable) in the same group.

Example data:

df = data.frame(group = c(rep('A',4), rep('B',3)),
            response = c(1,4,2,1,1,2,3),
            treatment = c("control","100mg","200mg","50mg","control","100mg","200mg"))

> df
    group response treatment
       A     1   control
       A     4     100mg
       A     2     200mg
       A     1      50mg
       B     1   control
       B     2     100mg
       B     3     200mg

In other words, I'd like to get the percentage change in response relative to the treatment 'control' for all other levels of treatment in the same group. The number of levels of treatment can vary group by group.

What I have so far:

# function for % change
pct <- function(x) {(x/lag(x)-1)*100}

library(dplyr)
# group data and apply function
percChange <- df %>% 
  group_by(group) %>% 
  mutate_at(vars(response), funs(pct))

# the output (perChange) is:

#   group response treatment
# 1 A        NA   control  
# 2 A       300   100mg    
# 3 A       -50   200mg    
# 4 A       -50   50mg     
# 5 B        NA   control  
# 6 B       100   100mg    
# 7 B        50   200mg

But the output I would like is:

# group  response  treatment
# 1 A        NA   control  
# 2 A       300   100mg    
# 3 A       100   200mg    
# 4 A       0     50mg     
# 5 B       NA    control  
# 6 B       100   100mg    
# 7 B       200   200mg

I have looked everywhere and found similar things but none are quite what I'm after. Thanks.

Upvotes: 1

Views: 508

Answers (2)

nsinghphd
nsinghphd

Reputation: 2022

JasonAizkalns answered it well, but just in case you want to keep your pct function. Just fixing a small error in your pct function make it work.

pct <- function(x) {
  ((x-x[1])/x[1]) * 100
}

> percChange
# A tibble: 7 x 3
# Groups:   group [2]
  group response treatment
  <fct>    <dbl> <fct>    
1 A            0 control  
2 A          300 100mg    
3 A          100 200mg    
4 A            0 50mg     
5 B            0 control  
6 B          100 100mg    
7 B          200 200mg    

Upvotes: 0

JasonAizkalns
JasonAizkalns

Reputation: 20463

You want to use first():

library(tidyverse)

df = data.frame(
  group = c(rep('A',4), rep('B',3)),
  response = c(1,4,2,1,1,2,3),
  treatment = c("control","100mg","200mg","50mg","control","100mg","200mg")
)

df %>%
  group_by(group) %>%
  mutate(
    resp_pct_chg_from_first = (response / first(response) - 1) * 100
  )
#> # A tibble: 7 x 4
#> # Groups:   group [2]
#>   group response treatment resp_pct_chg_from_first
#>   <fct>    <dbl> <fct>                       <dbl>
#> 1 A            1 control                         0
#> 2 A            4 100mg                         300
#> 3 A            2 200mg                         100
#> 4 A            1 50mg                            0
#> 5 B            1 control                         0
#> 6 B            2 100mg                         100
#> 7 B            3 200mg                         200

Created on 2019-03-20 by the reprex package (v0.2.1)

Upvotes: 2

Related Questions