myfatson
myfatson

Reputation: 549

Percent change for grouped subjects at multiple timepoints R

id  timepoint  dv.a
1   baseline   100       
1   1min       105       
1   2min       90        
2   baseline   70        
2   1min       100       
2   2min       80        
3   baseline   80        
3   1min       80        
3   2min       90       

I have repeated measures data for a given subject in long format as above. I'm looking to calculate percent change relative to baseline for each subject.

id  timepoint  dv   pct.chg 
1   baseline   100  100     
1   1min       105  105     
1   2min       90   90      
2   baseline   70   100     
2   1min       100  143     
2   2min       80   114     
3   baseline   80   100     
3   1min       80   100     
3   2min       90   113    

Upvotes: 2

Views: 690

Answers (4)

hello_friend
hello_friend

Reputation: 5788

Base R solution: (assuming "baseline" always appears as first record per group)

data.frame(do.call("rbind", lapply(split(df, df$id), 
       function(x){x$pct.change <- x$dv/x$dv[1]; return(x)})), row.names = NULL)

Data:

 df <- structure(
  list(
    id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L),
    timepoint = c(
      "baseline",
      "1min",
      "2min",
      "baseline",
      "1min",
      "2min",
      "baseline",
      "1min",
      "2min"
    ),
    dv = c(100L, 105L, 90L, 70L, 100L, 80L, 80L, 80L, 90L)
  ),
  class = "data.frame",
  row.names = c(NA,-9L)
)

Upvotes: 0

Daniel O
Daniel O

Reputation: 4358

in Base Ryou can do this

for(i in 1:(NROW(df)/3)){
  df[1+3*(i-1),4] <- 100
  df[2+3*(i-1),4] <- df[2+3*(i-1),3]/df[1+3*(i-1),3]*100
  df[3+3*(i-1),4] <- df[3+3*(i-1),3]/df[1+3*(i-1),3]*100
}

colnames(df)[4] <- "pct.chg"

output:

> df
  id timepoint dv.a  pct.chg
1  1  baseline  100 100.0000
2  1      1min  105 105.0000
3  1      2min   90  90.0000
4  2  baseline   70 100.0000
5  2      1min  100 142.8571
6  2      2min   80 114.2857
7  3  baseline   80 100.0000
8  3      1min   80 100.0000
9  3      2min   90 112.5000

Upvotes: 0

MMerry
MMerry

Reputation: 334

Try creating a helper column, group and arrange on that. Then use the window function first in your mutate function:

df %>% mutate(clean_timepoint = str_remove(timepoint,"min") %>% if_else(. == "baseline", "0", .) %>% as.numeric()) %>% 
  group_by(id) %>% 
  arrange(id,clean_timepoint) %>% 
  mutate(pct.chg = (dv / first(dv)) * 100) %>% 
  select(-clean_timepoint)

Upvotes: 0

Gavin Kelly
Gavin Kelly

Reputation: 2414

df <- expand.grid( time=c("baseline","1","2"), id=1:4)
df$dv <- sample(100,12)
df %>% group_by(id) %>%
 mutate(perc=dv*100/dv[time=="baseline"]) %>%
 ungroup()

You're wanting to do something for each 'id' group, so that's the group_by, then you need to create a new column, so there's a mutate. That new variable is the old dv, scaled by the value that dv takes at the baseline - hence the inner part of the mutate. And finally it's to remove the grouping you'd applied.

Upvotes: 2

Related Questions