fromtheloam
fromtheloam

Reputation: 455

How to add value to column using dplyr lag() based on prior row value in same column

I am trying to modify a column with test values (1 or 0) based on the prior row of the same column and the value in another column. It is not a problem for one or two rows, but I need to "lead" the value for up to 50 rows which is cumbersome to code.

df <- data.frame(matrix(nrow = 10, ncol = 3))
names(df) <- c("letter", "color", "test")
df[, 1] <- c("a", "a", "b", "b", "b", "b", "b", "c", "c", "c")
df[1:2, 2] <- c("blue")
df[8:10, 2] <- c("green")
df[, 3] <- c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0)

df:

   letter color test
1       a  blue    0
2       a  blue    0
3       b  <NA>    1
4       b  <NA>    0
5       b  <NA>    0
6       b  <NA>    0
7       b  <NA>    0
8       c green    0
9       c green    0
10      c green    0

My actual data is rather hard to recreate for this question, but this df is where I'm at after the first bit of coding - I've identified that the "1" in the test column (df[3, 3]) is where my problem starts. I now need to "lead" the 1 in the test column until the "color" is not NA and the "letter" is not "b".

Code such as the following will not suffice (even though my example makes it look like it will):

... mutate(test = ifelse(letter == "b" & is.na(color), 1, test)

For my purposes, I need to use the following code to "lead" the 1:

df <- df %>% 
  mutate(test = ifelse(lag(test) == 1 & is.na(color), 1, test))

However, running this code will only populate the single next row:

   letter color test
1       a  blue    0
2       a  blue    0
3       b  <NA>    1
4       b  <NA>    1
5       b  <NA>    0
6       b  <NA>    0
7       b  <NA>    0
8       c green    0
9       c green    0
10      c green    0

To accomplish this, I have been writing the same mutate command like 50 times (running over the same column). For example:

df <- df %>% 
  mutate(test = ifelse(lag(test) == 1 & is.na(color), 1, test), 
         test = ifelse(lag(test) == 1 & is.na(color), 1, test), 
         test = ifelse(lag(test) == 1 & is.na(color), 1, test), 
         test = ifelse(lag(test) == 1 & is.na(color), 1, test), 
         test = ifelse(lag(test) == 1 & is.na(color), 1, test))

This returns the desired result:

  letter color test
1       a  blue    0
2       a  blue    0
3       b  <NA>    1
4       b  <NA>    1
5       b  <NA>    1
6       b  <NA>    1
7       b  <NA>    1
8       c green    0
9       c green    0
10      c green    0

But this is quite clumsy and potentially leaves the possibility open that I did not write enough duplicate mutate commands.

Is there a way to use dplyr (or another method) to accomplish this? A for loop could probably get the job done, but I'm not sure it's necessary. Additionally, I am still learning R and have not really been able to figure out the apply family of functions. Could an apply variant be used in this situation?

Upvotes: 0

Views: 1363

Answers (2)

Sotos
Sotos

Reputation: 51582

You might be over complicating everything with lag. It seems that by grouping by letter you can simply fill all the rest of the positions (after 1) where your color is NA.

library(tidyverse)

df %>% 
 group_by(letter) %>% 
 mutate(test = replace(test, is.na(color) & test != 1, NA)) %>% 
 fill(test)

which gives,

# A tibble: 10 x 3
# Groups:   letter [3]
   letter color  test
    <chr> <chr> <dbl>
 1      a  blue     0
 2      a  blue     0
 3      b  <NA>     1
 4      b  <NA>     1
 5      b  <NA>     1
 6      b  <NA>     1
 7      b  <NA>     1
 8      c green     0
 9      c green     0
10      c green     0

Upvotes: 1

Prem
Prem

Reputation: 11955

If I understood you correctly then considering your example you can achieve the end result using

library(dplyr)
df %>% group_by(letter, color) %>%
  mutate(test = first(test))


#sample data
> dput(df)
structure(list(letter = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), color = structure(c(2L, 
2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L), .Label = c("<NA>", "blue", 
"green"), class = "factor"), test = c(0L, 0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L)), .Names = c("letter", "color", "test"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

Upvotes: 1

Related Questions