Find first occurrence of a pattern by row

Question

I wish to locate the first instance of two patterns by row. Specifically, I want to find the first occurrence of c(1,0) and the first occurrence of c(1,1) in each row of a data.frame. The code below does this using nested for-loops, but is very slow with large data sets.

Is there a way to do this efficiently in base R? This question is similar:

Finding pattern in a matrix in R

Here is my code, which returns a 2 if the pattern begins in Column 1 and ends in Column 2 and returns a 0 if the pattern does not occur in a row.

n <- 5

my.data <- expand.grid(rep(list(1:0), n))
my.data <- my.data[do.call(order, as.list(my.data)),]
my.data <- my.data[order(nrow(my.data):1),]

first.11 <- rep(0, nrow(my.data))
first.10 <- rep(0, nrow(my.data))

for(i in 1:nrow(my.data)) {
     for(j in 1:(ncol(my.data)-1)) {

     if(first.11[i] == 0 & my.data[i,j] == 1 & my.data[i,(j+1)] == 1) first.11[i] = j+1
     if(first.10[i] == 0 & my.data[i,j] == 1 & my.data[i,(j+1)] == 0) first.10[i] = j+1

     }
}

my.data2 <- data.frame(my.data, first.11, first.10)
my.data2

#   Var1 Var2 Var3 Var4 Var5 first.11 first.10
#1     1    1    1    1    1        2        0
#17    1    1    1    1    0        2        5
#9     1    1    1    0    1        2        4
#25    1    1    1    0    0        2        4
#5     1    1    0    1    1        2        3
#21    1    1    0    1    0        2        3
#13    1    1    0    0    1        2        3
#29    1    1    0    0    0        2        3
#3     1    0    1    1    1        4        2
#19    1    0    1    1    0        4        2
#11    1    0    1    0    1        0        2
#27    1    0    1    0    0        0        2
#7     1    0    0    1    1        5        2
#23    1    0    0    1    0        0        2
#15    1    0    0    0    1        0        2
#31    1    0    0    0    0        0        2
#2     0    1    1    1    1        3        0
#18    0    1    1    1    0        3        5
#10    0    1    1    0    1        3        4
#26    0    1    1    0    0        3        4
#6     0    1    0    1    1        5        3
#22    0    1    0    1    0        0        3
#14    0    1    0    0    1        0        3
#30    0    1    0    0    0        0        3
#4     0    0    1    1    1        4        0
#20    0    0    1    1    0        4        5
#12    0    0    1    0    1        0        4
#28    0    0    1    0    0        0        4
#8     0    0    0    1    1        5        0
#24    0    0    0    1    0        0        5
#16    0    0    0    0    1        0        0
#32    0    0    0    0    0        0        0

alexis_laz · Accepted Answer

Another idea is to check following columns of previous matches along the pattern:

ff = function(x, pat)
{
    nc = ncol(x) - (length(pat) - 1L)
    ans = arrayInd(seq_len(nrow(x) * nc), c(nrow(x), nc))
    for(i in seq_along(pat)) {
        ans = ans[x[ans] == pat[[i]], ]
        ans[, 2L] = ans[, 2L] + 1L
    }
    inds = aggregate(list(ans[, 2L] - 1L), list(ans[, 1L]), min) 
    ret = integer(nrow(x))
    ret[inds[[1L]]] = inds[[2L]]
    ret
}
all.equal(ff(my.data, c(1, 1)), my.data2$first.11)
#[1] TRUE
all.equal(ff(my.data, c(1, 0)), my.data2$first.10)
#[1] TRUE

And, also, on longer patterns:

ff(my.data, c(1, 0, 1, 1))
# [1] 0 0 0 0 5 0 0 0 4 4 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0

Find first occurrence of a pattern by row

Answers (2)

Related Questions