Add new column containing the difference with a fixed reference value in R

Question

I have the following dataset (5200 entries) of water chemical compounds annual means organized by monitoring sites and sampling year:

data= data.frame(Site_ID=c(1, 1, 1, 2, 2, 2, 3, 3, 3), Year=c(2000, 2001, 2002, 2001, 2002, 2003, 2002, 2003, 2004), AnnualMean=c(1.1, 1.2, 1.1, 2.1, 2.6, 3.1, 2.7, 2.6, 1.9))

I would like to add a new column to this data frame that contains the difference between the annual mean of a given year and a defined reference year, per monitoring site. Each monitoring site share the same reference year (here 2002). The resulting table would look like:

data= data.frame(Site_ID=c(1, 1, 1, 2, 2, 2, 3, 3, 3), Year=c(2000, 2001, 2002, 2001, 2002, 2003, 2002, 2003, 2004), AnnualMean=c(1.1, 1.2, 1.1, 2.1, 2.6, 3.1, 2.7, 2.6, 1.9), Diff_ref=c(0.0, 0.1, NA, -0.5, NA, 0.5, NA, -0.1, -0.8)

Thank you very much !

tmfmnk · Accepted Answer

One dplyr solution could be:

data %>%
 group_by(Site_ID) %>%
 mutate(Diff_ref = ifelse(Year == 2002, NA, AnnualMean - AnnualMean[Year == 2002]))

  Site_ID  Year AnnualMean Diff_ref
               
1       1  2000        1.1   0     
2       1  2001        1.2   0.1000
3       1  2002        1.1  NA     
4       2  2001        2.1  -0.5   
5       2  2002        2.6  NA     
6       2  2003        3.1   0.5   
7       3  2002        2.7  NA     
8       3  2003        2.6  -0.1   
9       3  2004        1.9  -0.8

Add new column containing the difference with a fixed reference value in R

Answers (2)

Related Questions