Reputation: 566
I seem to have stumbled upon a mutate/lag/ifelse
behaviour that I cannot explain. I have the following (simplified) dataframe:
test <- data.frame(type = c("START", "END", "START", "START", "START", "START", "END"),
stringsAsFactors = FALSE)
> test
type
1 START
2 END
3 START
4 START
5 START
6 START
7 START
8 END
I would like to modify the column type
in order to have a sequence of alternating START
and END
pairs (note that in the test
dataframe only sequences of START
are possible, END
is never repeated):
> desired
type
1 START
2 END
3 START
4 END
5 START
6 END
7 START
8 END
I thought I could achieve my goal with the following code:
test %>%
mutate(type = ifelse( type == "START" &
dplyr::lag(type, n=1, default="END") == "START" &
dplyr::lead(type, n=1, default="END") == "START", "END" , type))
The code should detect rows in which START
is preceded by a START
and followed by a START
, in which case the type
value is changed to END
. After this change, the following START
(row number 5 of test
) should not be matched, since its previous type
value is now END
. Unfortunately, the output of the command is the following:
type
1 START
2 END
3 START
4 END
5 END
6 END
7 START
8 END
It's like the value seen by lag
is not affected by mutate. Is this how it is supposed to work? Is there a way to code it in a way that lag
sees the effects of mutate
on the previous row?
Versions: R version 3.2.3 (2015-12-10), dplyr_0.4.3
UPDATE: The reason why the above code doesn't work is explained by Paul Rougieux below: lead and lag are fixed and do not take into account further modification. So I guess the correct answer is "it cannot be done straightforwardly using dplyr".
Upvotes: 2
Views: 1421
Reputation: 9618
How about this?
test$type[test$type != c("START", "END")] <-
ifelse(test$type[test$type != c("START", "END")] == "START", "END", "START")
test
type
1 START
2 END
3 START
4 END
5 START
6 END
7 START
(The warnings can be ignored)
Upvotes: 0
Reputation: 11409
Defining lag and lead variables separately in mutate()
will show you that your call to ifelse(type == "START" & lag == "START" & lead == "START", "END" , type)
is not going to work:
test <- data.frame(type = c("START", "END", "START", "START", "START", "START", "END"), stringsAsFactors = FALSE)
test %>%
mutate(lag = dplyr::lag(type, n=1, default="END"),
lead = dplyr::lead(type, n=1, default="END"),
type2 = ifelse(type == "START" & lag == "START" & lead == "START",
"END" , type))
# type lag lead type2
#1 START END END START
#2 END START START END
#3 START END START START
#4 START START START END
#5 START START START END
#6 START START END START
#7 END START END END
dplyr::mutate()
modifies the vector as a whole. Lead and lag are fixed and do not take into account further modification to the type
vector. What you want is a `Reduce()̀ function in this case. Check help(Reduce).
Upvotes: 1