ALEXIS CAZORLA
ALEXIS CAZORLA

Reputation: 77

Add new column containing the difference with a fixed reference value in R

I have the following dataset (5200 entries) of water chemical compounds annual means organized by monitoring sites and sampling year:

data= data.frame(Site_ID=c(1, 1, 1, 2, 2, 2, 3, 3, 3), Year=c(2000, 2001, 2002, 2001, 2002, 2003, 2002, 2003, 2004), AnnualMean=c(1.1, 1.2, 1.1, 2.1, 2.6, 3.1, 2.7, 2.6, 1.9))

I would like to add a new column to this data frame that contains the difference between the annual mean of a given year and a defined reference year, per monitoring site. Each monitoring site share the same reference year (here 2002). The resulting table would look like:

data= data.frame(Site_ID=c(1, 1, 1, 2, 2, 2, 3, 3, 3), Year=c(2000, 2001, 2002, 2001, 2002, 2003, 2002, 2003, 2004), AnnualMean=c(1.1, 1.2, 1.1, 2.1, 2.6, 3.1, 2.7, 2.6, 1.9), Diff_ref=c(0.0, 0.1, NA, -0.5, NA, 0.5, NA, -0.1, -0.8)

Thank you very much !

Upvotes: 4

Views: 56

Answers (2)

akrun
akrun

Reputation: 887118

Using data.table

library(data.table)
setDT(data)[, Diff_ref := fifelse(Year == 2002, NA_real_,
                 AnnualMean - AnnualMean[Year == 2002]), Site_ID]

Upvotes: 0

tmfmnk
tmfmnk

Reputation: 39858

One dplyr solution could be:

data %>%
 group_by(Site_ID) %>%
 mutate(Diff_ref = ifelse(Year == 2002, NA, AnnualMean - AnnualMean[Year == 2002]))

  Site_ID  Year AnnualMean Diff_ref
    <dbl> <dbl>      <dbl>    <dbl>
1       1  2000        1.1   0     
2       1  2001        1.2   0.1000
3       1  2002        1.1  NA     
4       2  2001        2.1  -0.5   
5       2  2002        2.6  NA     
6       2  2003        3.1   0.5   
7       3  2002        2.7  NA     
8       3  2003        2.6  -0.1   
9       3  2004        1.9  -0.8 

Upvotes: 3

Related Questions