Manuel Ferreria
Manuel Ferreria

Reputation: 1232

R - Repetitions of an array in other array

From a dataframe I get a new array, sliced from a dataframe. I want to get the amount of times a certain repetition appears on it.

For example

main <- c(A,B,C,A,B,V,A,B,C,D,E)
p <- c(A,B,C)
q <- c(A,B)

someFunction(main,p)
2

someFunction(main,q)
3

I've been messing around with rle but it counts every subrepetion also, undersirable.

Is there a quick solution I'm missing?

Upvotes: 1

Views: 476

Answers (4)

Prasad Chalasani
Prasad Chalasani

Reputation: 20282

Here's a way to do it using embed(v,n), which returns a matrix of all n-length sub-sequences of vector v:

find_x_in_y <- function(x, y) 
                   sum( apply( embed( y, length(x)), 1, 
                                  identical, rev(x)))

> find_x_in_y(p, main)
[1] 2
> find_x_in_y(q, main)
[1] 3

Upvotes: 2

Andrie
Andrie

Reputation: 179448

Using sapply:

find_x_in_y <- function(x, y){
  sum(sapply(
      seq_len(length(y)-length(x)),
      function(i)as.numeric(all(y[i:(i+length(x)-1)]==x))
  ))
}


find_x_in_y(c("A", "B", "C"), main)
[1] 2

find_x_in_y(c("A", "B"), main)
[1] 3

Upvotes: 2

Chase
Chase

Reputation: 69201

You can use one of the regular expression tools in R since this is really a pattern matching exercise, specifically gregexpr for this question. The p and q vectors represent the search pattern and main is where we want to search for those patterns. From the help page for gregexpr:

gregexpr returns a list of the same length as text each element of which is of 
the same form as the return value for regexpr, except that the starting positions 
of every (disjoint) match are given. 

So we can take the length of the first list returned by gregexpr which gives the starting positions of the matches. We'll first collapse the vectors and then do the searching:

someFunction <- function(haystack, needle) {
    haystack <- paste(haystack, collapse = "")
    needle <- paste(needle, collapse = "")
    out <- gregexpr(needle, haystack)
    out.length <- length(out[[1]])
    return(out.length)
}

> someFunction(main, p)
[1] 2
> someFunction(main, q)
[1] 3

Note - you also need to throw "" around your vector main, p, and q vectors unless you have variables A, B, C, et al defined.

main <- c("A","B","C","A","B","V","A","B","C","D","E")
p <- c("A","B","C")
q <- c("A","B")

Upvotes: 4

kohske
kohske

Reputation: 66852

I'm not sure if this is the best way, but you can simply do that work by:

f <- function(a,b) 
  if (length(a) > length(b)) 0 
  else all(head(b, length(a)) == a) + Recall(a, tail(b, -1))

Someone may or may not find a built-in function.

Upvotes: 3

Related Questions