Reputation: 47
I have a number of treatments with repeated measurements, and I would like to subtract the values of the control for each time point for each treatment. The data set is shaped like this, with multiple years, species and treatments.
ID Year Species Treatment value
1 2010 x control 0.04
1 2011 x control 0.10
2 2010 x MaxDamage 0.02
2 2011 x MaxDamage 0.06
I would like to add a column
difference =( value of the Treatment for each year - value of the control for each year)
ID Year Species Treatment value difference
1 2010 x control 0.04 0
1 2011 x control 0.1 0
2 2010 x MaxDamage 0.02 -0.02
2 2011 x MaxDamage 0.06 -0.04
Any suggestion will be much welcome, thank you
Upvotes: 2
Views: 611
Reputation: 66819
You could join on a table containing the control values:
library(data.table)
setDT(DF)
DF[
DF[Treatment == "control", .(Year, c_value = value)],
on=.(Year),
d := value - c_value
][]
# or
library(dplyr)
left_join(DF,
DF %>% filter(Treatment == "control") %>% select(Year, c_value = value)
) %>% mutate(d = value - c_value) %>% select(-c_value)
The data.table way modifies DF, while dplyr makes a new table.
Data used:
DF = structure(list(ID = c(1L, 1L, 2L, 2L), Year = c(2010L, 2011L,
2010L, 2011L), Species = c("x", "x", "x", "x"), Treatment = c("control",
"control", "MaxDamage", "MaxDamage"), value = c(0.04, 0.1, 0.02,
0.06)), .Names = c("ID", "Year", "Species", "Treatment", "value"
), row.names = c(NA, -4L), class = "data.frame")
Upvotes: 2
Reputation: 887501
We can group by 'Year' and then do the difference between the 'value' column and the 'value' that corresponds to 'Treatment' as "control"
library(dplyr)
df1 %>%
group_by(Year) %>%
mutate(difference = value - value[Treatment == "control"])
# A tibble: 4 x 6
# Groups: Year [2]
# ID Year Species Treatment value difference
# <int> <int> <chr> <chr> <dbl> <dbl>
#1 1 2010 x control 0.04 0
#2 1 2011 x control 0.1 0
#3 2 2010 x MaxDamage 0.02 -0.02
#4 2 2011 x MaxDamage 0.06 -0.04
If the order of occurence of 'control' is the same as if each 'Year' have two 'Treatment', then instead of grouping by, we can subset the 'value' and then do the difference directly
df1 %>%
mutate(difference = value - rep(value[Treatment == "control"], ceiling(n()/2)))
Upvotes: 3