Marco
Marco

Reputation: 2817

How to use dplyr lag() to smooth minor changes in a variable

I have grouped data and a variable I would like to smooth per group. If the absolute changes are small (e.g. less than 5) I consider them measurement error and thus want to copy (roll forward) the old value. Within each group I initialize the first measurement as default. Thereby I assume that the first observation per group is always correct (up to debate).

set.seed(5)
mydata = data.frame(group=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2), 
                       year=seq(from=2003, to=2009, by=1), 
                       variable = round(runif(14, min = -5, max = 15),0))
mydata %>%
  filter(variable > 0) %>%
  group_by(group) %>%
  mutate(smooth5 = ifelse( abs( lag(variable, n = 1, default = first(variable)) - variable ) <= 5 , variable, 5)) %>%       
  select(group, year, variable, smooth5) %>%
  arrange(group)

# A tibble: 10 x 4
# Groups:   group [2]
   group  year variable smooth5
   <dbl> <dbl>    <dbl>   <dbl>
 1     1  2004        9       9
 2     1  2005       13      13  # <- this change is |4|, thus it should use the old value 9
 3     1  2006        1       5  # <- here 13 changes to 1 is a reasonable change, should keep 1
 4     1  2008        9       5
 5     1  2009        6       6
 6     2  2003       11      11
 7     2  2004       14      14
 8     2  2007        5       5
 9     2  2008        1       1
10     2  2009        6       6

Upvotes: 1

Views: 194

Answers (1)

Bas
Bas

Reputation: 4658

You are close, but there is some mistake in your ifelse() call. Below, I added a new variable previous for clarity. If abs(previous - variable) <= 5, you want previous, otherwise you want variable:

mydata %>%
  filter(variable > 0) %>%
  group_by(group) %>%
  mutate(previous = lag(variable, n = 1, default = first(variable)),
         smooth5 = ifelse(abs(previous - variable) <= 5, previous, variable)) %>%       
  select(group, year, variable, smooth5) %>%
  arrange(group)

which gives

# A tibble: 10 x 4
# Groups:   group [2]
   group  year variable smooth5
   <dbl> <dbl>    <dbl>   <dbl>
 1     1  2004        9       9
 2     1  2005       13       9
 3     1  2006        1       1
 4     1  2008        9       9
 5     1  2009        6       9
 6     2  2003       11      11
 7     2  2004       14      11
 8     2  2007        5       5
 9     2  2008        1       5
10     2  2009        6       1

Upvotes: 1

Related Questions