Reputation: 455
I am trying to modify a column with test values (1 or 0) based on the prior row of the same column and the value in another column. It is not a problem for one or two rows, but I need to "lead" the value for up to 50 rows which is cumbersome to code.
df <- data.frame(matrix(nrow = 10, ncol = 3))
names(df) <- c("letter", "color", "test")
df[, 1] <- c("a", "a", "b", "b", "b", "b", "b", "c", "c", "c")
df[1:2, 2] <- c("blue")
df[8:10, 2] <- c("green")
df[, 3] <- c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0)
df:
letter color test
1 a blue 0
2 a blue 0
3 b <NA> 1
4 b <NA> 0
5 b <NA> 0
6 b <NA> 0
7 b <NA> 0
8 c green 0
9 c green 0
10 c green 0
My actual data is rather hard to recreate for this question, but this df
is where I'm at after the first bit of coding - I've identified that the "1" in the test column (df[3, 3]
) is where my problem starts. I now need to "lead" the 1 in the test column until the "color" is not NA
and the "letter" is not "b".
Code such as the following will not suffice (even though my example makes it look like it will):
... mutate(test = ifelse(letter == "b" & is.na(color), 1, test)
For my purposes, I need to use the following code to "lead" the 1:
df <- df %>%
mutate(test = ifelse(lag(test) == 1 & is.na(color), 1, test))
However, running this code will only populate the single next row:
letter color test
1 a blue 0
2 a blue 0
3 b <NA> 1
4 b <NA> 1
5 b <NA> 0
6 b <NA> 0
7 b <NA> 0
8 c green 0
9 c green 0
10 c green 0
To accomplish this, I have been writing the same mutate
command like 50 times (running over the same column). For example:
df <- df %>%
mutate(test = ifelse(lag(test) == 1 & is.na(color), 1, test),
test = ifelse(lag(test) == 1 & is.na(color), 1, test),
test = ifelse(lag(test) == 1 & is.na(color), 1, test),
test = ifelse(lag(test) == 1 & is.na(color), 1, test),
test = ifelse(lag(test) == 1 & is.na(color), 1, test))
This returns the desired result:
letter color test
1 a blue 0
2 a blue 0
3 b <NA> 1
4 b <NA> 1
5 b <NA> 1
6 b <NA> 1
7 b <NA> 1
8 c green 0
9 c green 0
10 c green 0
But this is quite clumsy and potentially leaves the possibility open that I did not write enough duplicate mutate
commands.
Is there a way to use dplyr
(or another method) to accomplish this? A for loop could probably get the job done, but I'm not sure it's necessary. Additionally, I am still learning R and have not really been able to figure out the apply
family of functions. Could an apply
variant be used in this situation?
Upvotes: 0
Views: 1363
Reputation: 51582
You might be over complicating everything with lag
. It seems that by grouping by letter
you can simply fill
all the rest of the positions (after 1) where your color is NA
.
library(tidyverse)
df %>%
group_by(letter) %>%
mutate(test = replace(test, is.na(color) & test != 1, NA)) %>%
fill(test)
which gives,
# A tibble: 10 x 3 # Groups: letter [3] letter color test <chr> <chr> <dbl> 1 a blue 0 2 a blue 0 3 b <NA> 1 4 b <NA> 1 5 b <NA> 1 6 b <NA> 1 7 b <NA> 1 8 c green 0 9 c green 0 10 c green 0
Upvotes: 1
Reputation: 11955
If I understood you correctly then considering your example you can achieve the end result using
library(dplyr)
df %>% group_by(letter, color) %>%
mutate(test = first(test))
#sample data
> dput(df)
structure(list(letter = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), color = structure(c(2L,
2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L), .Label = c("<NA>", "blue",
"green"), class = "factor"), test = c(0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L)), .Names = c("letter", "color", "test"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
Upvotes: 1