Reputation: 559
I would like the find the pattern of either a 0/1 followed by a 2 which occurs more than three times in a row. I would like to find this pattern and transform the 2's in this pattern into 1s - such as
Input:
Y <- c(0,1,0,3,2,5,2,1,2,0,2,1,2,0,1,2,1,3,1,2,1)
Some Function findPattern that finds the pattern:
findPattern(Y)
And Outputs the following:
[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0
I have tried the following:
as.numeric(Y == 2 & lead(Y) %in% 1:2)
Upvotes: 1
Views: 1676
Reputation: 215127
Here is a possible approach to solve the problem where you can combine with the regular expression to find the pattern.
Starting vector:
> Y
[1] 0 2 0 3 2 5 2 1 2 0 2 1 2 0 1
1) Find out all the 2
s preceded by 0
or 1
;
> ind <- as.integer(lag(Y %in% c(0, 1)) & (Y == 2) )
> ind
[1] 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0
2) Paste the resulting vector into a string and use regular expression to find out the location and length of the required pattern, i.e., alternating 0
and 1
equal or more than three times;
> id <- gregexpr("(01){3,}", paste0(ind, collapse = ""))
> id
[[1]]
[1] 8
attr(,"match.length")
[1] 6
attr(,"useBytes")
[1] TRUE
3) Extracting the location and length from the regular expression result and convert them into the index pattern;
> start <- as.numeric(id[[1]])
> end <- start + attr(id[[1]], "match.length") - 1
> indArray <- unlist(Map(`:`, start, end))
> indArray
[1] 8 9 10 11 12 13
4) Assign all the values at 01
pattern less than 3
times to 0
> ind[-indArray] <- 0
> ind
[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0
Wrap them into a function:
library(dplyr)
findPattern <- function(Y) {
ind <- as.integer(lag(Y %in% c(0, 1)) & (Y == 2) )
id <- gregexpr("(01){3,}", paste0(ind, collapse = ""))
start <- as.numeric(id[[1]])
end <- start + attr(id[[1]], "match.length") - 1
indArray <- unlist(Map(`:`, start, end))
ind[-indArray] <- 0
ind
}
Upvotes: 1
Reputation: 1237
findPattern<-function(Y){
as.numeric(Y==2 & (c(NA,Y[-length(Y)])==0 |c(NA,Y[-length(Y)])==1 ))
}
I add a NA a the start and remove last item so that you "shift" your vector by 1 position but still keep same vector length. This way you avoid for
loops.
If you want to use %in%
which avoids a second passage:
findPattern<-function(Y){
as.numeric(Y==2 & (c(NA,Y[-length(Y)]) %in% c(0,1))
}
findPattern<-function(Y){
w <- which(Y==2 & (c(NA,Y[-length(Y)]) %in% c(0,1)))
centers<- w[((w - 2) %in% w) & ((w+2) %in% w)]
result<-rep(0, times = length(Y))
result[c(centers,centers-2,centers+2)]<-1
return(result)
}
Testing:
findPattern(c(0,1,0,3,2,5,2,1,2,0,2,1,2,0,1,2,1,3,1,2,1))
[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0
Upvotes: 2
Reputation: 3597
Using stringi
package
Y <- c(0,1,0,3,2,5,2,1,2,0,2,1,2,0,1)
matchVec = stri_count(Y,fixed=2)
remapVec = as.integer(matchVec & (cumsum(matchVec)>=3))
remapVec
#[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0
Upvotes: 0