domi
domi

Reputation: 566

Lag doesn't see the effects of mutate on previous rows

I seem to have stumbled upon a mutate/lag/ifelse behaviour that I cannot explain. I have the following (simplified) dataframe:

test <- data.frame(type = c("START", "END", "START", "START", "START", "START", "END"),
                   stringsAsFactors = FALSE)

> test

  type
1 START
2   END
3 START
4 START
5 START
6 START
7 START
8   END

I would like to modify the column type in order to have a sequence of alternating START and END pairs (note that in the test dataframe only sequences of START are possible, END is never repeated):

> desired

  type
1 START
2   END
3 START
4   END
5 START
6   END
7 START
8   END

I thought I could achieve my goal with the following code:

test %>%
 mutate(type = ifelse( type == "START" &
                       dplyr::lag(type, n=1, default="END") == "START" &
                       dplyr::lead(type, n=1, default="END") == "START", "END" , type))

The code should detect rows in which START is preceded by a START and followed by a START, in which case the type value is changed to END. After this change, the following START (row number 5 of test) should not be matched, since its previous type value is now END. Unfortunately, the output of the command is the following:

   type
1 START
2   END
3 START
4   END
5   END
6   END
7 START
8   END 

It's like the value seen by lag is not affected by mutate. Is this how it is supposed to work? Is there a way to code it in a way that lag sees the effects of mutate on the previous row?

Versions: R version 3.2.3 (2015-12-10), dplyr_0.4.3

UPDATE: The reason why the above code doesn't work is explained by Paul Rougieux below: lead and lag are fixed and do not take into account further modification. So I guess the correct answer is "it cannot be done straightforwardly using dplyr".

Upvotes: 2

Views: 1421

Answers (2)

DatamineR
DatamineR

Reputation: 9618

How about this?

 test$type[test$type != c("START", "END")] <- 
 ifelse(test$type[test$type != c("START", "END")] == "START", "END", "START")

test
   type
1 START
2   END
3 START
4   END
5 START
6   END
7 START

(The warnings can be ignored)

Upvotes: 0

Paul Rougieux
Paul Rougieux

Reputation: 11409

Defining lag and lead variables separately in mutate() will show you that your call to ifelse(type == "START" & lag == "START" & lead == "START", "END" , type) is not going to work:

test <- data.frame(type = c("START", "END", "START", "START", "START", "START", "END"), stringsAsFactors = FALSE)
test %>%
    mutate(lag = dplyr::lag(type, n=1, default="END"),
           lead = dplyr::lead(type, n=1, default="END"),
           type2 = ifelse(type == "START" & lag == "START" & lead == "START",
                          "END" , type))

#   type   lag  lead type2
#1 START   END   END START
#2   END START START   END
#3 START   END START START
#4 START START START   END
#5 START START START   END
#6 START START   END START
#7   END START   END   END

dplyr::mutate() modifies the vector as a whole. Lead and lag are fixed and do not take into account further modification to the type vector. What you want is a `Reduce()̀ function in this case. Check help(Reduce).

Upvotes: 1

Related Questions