Joost Maxen
Joost Maxen

Reputation: 189

Lag value to two rows R

I want to lag a value to two waves that have a

This is the data:

df <- data.frame(wave = c(16, 20, 20, 36, 40, 40),
                 area = c(1,1,1,1,1,1),
                 party = c("A", "A", "A", "B", "B", "B"),
                 loyal = c(NA, 0, 1, NA, 0, 1),
                 ipref_1 = c(0.25, 0.25, 0.5, -0.25, 0.5, -0.25),
                 ipref_2 = c(0.5, 0.25, -0.25, -0.75, 0.5, -0.25),
                 ipref_3 = c(-0.5, -0.25, 0, 0.5, -0.25, 0))

Which looks like:

wave area party loyal ipref_1 ipref_2 ipref_3
16    1     A    NA    0.25    0.50   -0.50
20    1     A     0    0.25    0.25   -0.25
20    1     A     1    0.50   -0.25    0.00
36    1     B    NA   -0.25   -0.75    0.50
40    1     B     0    0.50    0.50   -0.25
40    1     B     1   -0.25   -0.25    0.00

I want to lag ipref_1, ipref_2, and ipref_3 of waves 16 and 36 to waves 20 and 40, respectively. But, there is an additional variable loyal with scores 0 and 1 in the later wave. I want to lag the value of wave 16 (and 36) to both rows of wave 20 (and 40).

I tried:

df <- df %>% group_by(party, area) %>%
  mutate(l_ipref_1 = lag(ipref_1, 1, order_by = wave, na.rm =T),
         l_ipref_2 = lag(ipref_2, 1, order_by = wave, na.rm =T),
         l_ipref_3 = lag(ipref_3, 1, order_by = wave, na.rm =T))

But it gives me:

wave  area party loyal ipref_1 ipref_2 ipref_3 l_ipref_1 l_ipref_2 l_ipref_3
<dbl> <dbl> <chr> <dbl>   <dbl>   <dbl>   <dbl>     <dbl>     <dbl>     <dbl>
16     1 A        NA    0.25    0.5    -0.5      NA        NA        NA   
20     1 A         0    0.25    0.25   -0.25      0.25      0.5      -0.5 
20     1 A         1    0.5    -0.25    0         0.25      0.25     -0.25
36     1 B        NA   -0.25   -0.75    0.5      NA        NA        NA   
40     1 B         0    0.5     0.5    -0.25     -0.25     -0.75      0.5 
40     1 B         1   -0.25   -0.25    0         0.5       0.5      -0.25

Here, in wave 20 (and 40) the score of loyal = 0 is lagged on loyal = 1, but what I need is:

wave  area party loyal ipref_1 ipref_2 ipref_3 l_ipref_1 l_ipref_2 l_ipref_3
<dbl> <dbl> <chr> <dbl>   <dbl>   <dbl>   <dbl>     <dbl>     <dbl>     <dbl>
16     1 A        NA    0.25    0.5    -0.5      NA        NA        NA   
20     1 A         0    0.25    0.25   -0.25      0.25      0.5      -0.5 
20     1 A         1    0.5    -0.25    0         0.25      0.5      -0.5
36     1 B        NA   -0.25   -0.75    0.5      NA        NA        NA   
40     1 B         0    0.5     0.5    -0.25     -0.25     -0.75      0.5 
40     1 B         1   -0.25   -0.25    0        -0.25     -0.75      0.5

Any suggestions? Many thanks

Upvotes: 0

Views: 86

Answers (1)

Wimpel
Wimpel

Reputation: 27792

here is a data.table approach

the core of the apporach is the ifelse-statement; ifelse(is.na(loyal), NA, x[1]). Which means: if is.na(loyal) == TRUE, then return value NA, if not, return the first value of the ipref-column (from the current group).

Solution words for any number of columns starting with ipref

library(data.table)
cols <- grep("^ipref", names(df), value = TRUE)
setDT(df)[, paste0("l_", cols) := 
            lapply(.SD, function(x) ifelse(is.na(loyal), NA, x[1])), 
          by = .(area, party), .SDcols = cols][]
#    wave area party loyal ipref_1 ipref_2 ipref_3 l_ipref_1 l_ipref_2 l_ipref_3
# 1:   16    1     A    NA    0.25    0.50   -0.50        NA        NA        NA
# 2:   20    1     A     0    0.25    0.25   -0.25      0.25      0.50      -0.5
# 3:   20    1     A     1    0.50   -0.25    0.00      0.25      0.50      -0.5
# 4:   36    1     B    NA   -0.25   -0.75    0.50        NA        NA        NA
# 5:   40    1     B     0    0.50    0.50   -0.25     -0.25     -0.75       0.5
# 6:   40    1     B     1   -0.25   -0.25    0.00     -0.25     -0.75       0.5

Upvotes: 2

Related Questions