user2991591
user2991591

Reputation: 47

Subtracting the response of the "control" to all other groups

I have a number of treatments with repeated measurements, and I would like to subtract the values of the control for each time point for each treatment. The data set is shaped like this, with multiple years, species and treatments.

 ID Year Species Treatment value
 1  2010  x       control   0.04
 1  2011  x       control   0.10
 2  2010  x       MaxDamage 0.02
 2  2011  x       MaxDamage 0.06

I would like to add a column

 difference =( value of the Treatment for each year - value of the control for each year)

 ID Year Species Treatment value  difference
 1  2010  x       control   0.04   0
 1  2011  x       control   0.1    0
 2  2010  x       MaxDamage 0.02  -0.02
 2  2011  x       MaxDamage 0.06  -0.04

Any suggestion will be much welcome, thank you

Upvotes: 2

Views: 611

Answers (2)

Frank
Frank

Reputation: 66819

You could join on a table containing the control values:

library(data.table)
setDT(DF)

DF[
  DF[Treatment == "control", .(Year, c_value = value)], 
  on=.(Year), 
  d := value - c_value
][]

# or
library(dplyr)

left_join(DF, 
  DF %>% filter(Treatment == "control") %>% select(Year, c_value = value)
) %>% mutate(d = value - c_value) %>% select(-c_value)

The data.table way modifies DF, while dplyr makes a new table.

Data used:

DF = structure(list(ID = c(1L, 1L, 2L, 2L), Year = c(2010L, 2011L, 
2010L, 2011L), Species = c("x", "x", "x", "x"), Treatment = c("control", 
"control", "MaxDamage", "MaxDamage"), value = c(0.04, 0.1, 0.02, 
0.06)), .Names = c("ID", "Year", "Species", "Treatment", "value"
), row.names = c(NA, -4L), class = "data.frame")

Upvotes: 2

akrun
akrun

Reputation: 887501

We can group by 'Year' and then do the difference between the 'value' column and the 'value' that corresponds to 'Treatment' as "control"

library(dplyr)
df1 %>%
   group_by(Year) %>%
   mutate(difference = value - value[Treatment == "control"])
# A tibble: 4 x 6
# Groups:   Year [2]
#     ID  Year Species Treatment value difference
#  <int> <int> <chr>   <chr>     <dbl>      <dbl>
#1     1  2010 x       control    0.04       0   
#2     1  2011 x       control    0.1        0   
#3     2  2010 x       MaxDamage  0.02      -0.02
#4     2  2011 x       MaxDamage  0.06      -0.04

If the order of occurence of 'control' is the same as if each 'Year' have two 'Treatment', then instead of grouping by, we can subset the 'value' and then do the difference directly

df1 %>%
    mutate(difference = value - rep(value[Treatment == "control"], ceiling(n()/2)))

Upvotes: 3

Related Questions