Reputation: 77
I have the following dataset (5200 entries) of water chemical compounds annual means organized by monitoring sites and sampling year:
data= data.frame(Site_ID=c(1, 1, 1, 2, 2, 2, 3, 3, 3), Year=c(2000, 2001, 2002, 2001, 2002, 2003, 2002, 2003, 2004), AnnualMean=c(1.1, 1.2, 1.1, 2.1, 2.6, 3.1, 2.7, 2.6, 1.9))
I would like to add a new column to this data frame that contains the difference between the annual mean of a given year and a defined reference year, per monitoring site. Each monitoring site share the same reference year (here 2002). The resulting table would look like:
data= data.frame(Site_ID=c(1, 1, 1, 2, 2, 2, 3, 3, 3), Year=c(2000, 2001, 2002, 2001, 2002, 2003, 2002, 2003, 2004), AnnualMean=c(1.1, 1.2, 1.1, 2.1, 2.6, 3.1, 2.7, 2.6, 1.9), Diff_ref=c(0.0, 0.1, NA, -0.5, NA, 0.5, NA, -0.1, -0.8)
Thank you very much !
Upvotes: 4
Views: 56
Reputation: 887118
Using data.table
library(data.table)
setDT(data)[, Diff_ref := fifelse(Year == 2002, NA_real_,
AnnualMean - AnnualMean[Year == 2002]), Site_ID]
Upvotes: 0
Reputation: 39858
One dplyr
solution could be:
data %>%
group_by(Site_ID) %>%
mutate(Diff_ref = ifelse(Year == 2002, NA, AnnualMean - AnnualMean[Year == 2002]))
Site_ID Year AnnualMean Diff_ref
<dbl> <dbl> <dbl> <dbl>
1 1 2000 1.1 0
2 1 2001 1.2 0.1000
3 1 2002 1.1 NA
4 2 2001 2.1 -0.5
5 2 2002 2.6 NA
6 2 2003 3.1 0.5
7 3 2002 2.7 NA
8 3 2003 2.6 -0.1
9 3 2004 1.9 -0.8
Upvotes: 3