user3324491
user3324491

Reputation: 559

Find a numeric pattern R

I would like the find the pattern of either a 0/1 followed by a 2 which occurs more than three times in a row. I would like to find this pattern and transform the 2's in this pattern into 1s - such as

Input:

Y <- c(0,1,0,3,2,5,2,1,2,0,2,1,2,0,1,2,1,3,1,2,1)

Some Function findPattern that finds the pattern:

findPattern(Y)

And Outputs the following:

[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0

I have tried the following:

as.numeric(Y == 2 & lead(Y) %in% 1:2)

Upvotes: 1

Views: 1676

Answers (3)

akuiper
akuiper

Reputation: 215127

Here is a possible approach to solve the problem where you can combine with the regular expression to find the pattern.

Starting vector:

> Y
 [1] 0 2 0 3 2 5 2 1 2 0 2 1 2 0 1

1) Find out all the 2s preceded by 0 or 1;

> ind <- as.integer(lag(Y %in% c(0, 1)) & (Y == 2) )
> ind
 [1] 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0

2) Paste the resulting vector into a string and use regular expression to find out the location and length of the required pattern, i.e., alternating 0 and 1 equal or more than three times;

> id <- gregexpr("(01){3,}", paste0(ind, collapse = ""))
> id
[[1]]
[1] 8
attr(,"match.length")
[1] 6
attr(,"useBytes")
[1] TRUE

3) Extracting the location and length from the regular expression result and convert them into the index pattern;

> start <- as.numeric(id[[1]])
> end <- start + attr(id[[1]], "match.length") - 1
> indArray <- unlist(Map(`:`, start, end))
> indArray
[1]  8  9 10 11 12 13

4) Assign all the values at 01 pattern less than 3 times to 0

> ind[-indArray] <- 0
> ind
 [1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0

Wrap them into a function:

library(dplyr)
findPattern <- function(Y) {
    ind <- as.integer(lag(Y %in% c(0, 1)) & (Y == 2) )
    id <- gregexpr("(01){3,}", paste0(ind, collapse = ""))
    start <- as.numeric(id[[1]])
    end <- start + attr(id[[1]], "match.length") - 1
    indArray <- unlist(Map(`:`, start, end))
    ind[-indArray] <- 0
    ind
}

Upvotes: 1

DeveauP
DeveauP

Reputation: 1237

1. Find 0/1 followed by 2s

findPattern<-function(Y){
    as.numeric(Y==2 & (c(NA,Y[-length(Y)])==0 |c(NA,Y[-length(Y)])==1 ))
}

I add a NA a the start and remove last item so that you "shift" your vector by 1 position but still keep same vector length. This way you avoid for loops.

If you want to use %in% which avoids a second passage:

findPattern<-function(Y){
    as.numeric(Y==2 & (c(NA,Y[-length(Y)]) %in% c(0,1))
}

2. Select only those that have at least three 1s every other position

findPattern<-function(Y){

    w <- which(Y==2 & (c(NA,Y[-length(Y)]) %in% c(0,1)))
    centers<- w[((w - 2) %in% w) & ((w+2) %in% w)]
    result<-rep(0, times = length(Y))
    result[c(centers,centers-2,centers+2)]<-1
    return(result)
}

Testing:

findPattern(c(0,1,0,3,2,5,2,1,2,0,2,1,2,0,1,2,1,3,1,2,1))
[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0

Upvotes: 2

Silence Dogood
Silence Dogood

Reputation: 3597

Using stringi package

Y <- c(0,1,0,3,2,5,2,1,2,0,2,1,2,0,1)
matchVec = stri_count(Y,fixed=2)
remapVec = as.integer(matchVec & (cumsum(matchVec)>=3))
remapVec
#[1] 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0

Upvotes: 0

Related Questions