Jeff Keller
Jeff Keller

Reputation: 781

Count Sequential Occurrences of Value in Vector

Given a generic vector x (which could be numeric, character, factor, etc.), I need to be able to count the number of sequential occurrences of value, including singletons.

x <- c(0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1)

In this case, the following would be returned if value == 0.

[1] 1 1 2 3

This code below works, but it is very slow when x gets large. Does anyone know of a way to speed this up? I imagine there must be a clever vectorized way of doing this.

getSequential <- function(x, value) {

  counts <- c()
  last <- FALSE

  for (i in 1:length(x)) {

    if (last & x[i] == value) {
      counts[length(counts)] <- counts[length(counts)] + 1
    } else if (x[i] == value) {
      counts <- c(counts, 1)
      last <- TRUE
    } else {
      last <- FALSE
    }

  }

  return(counts)

}

Upvotes: 0

Views: 166

Answers (2)

Piotr Kamoda
Piotr Kamoda

Reputation: 1006

Well, the code is quite good. I doubt using rle and with or which together would increase speed of the algorithm (much).

My proposition:

counting(int[] input, int value) {
    int[] len = int[size(input)](); \\assume worst case scenario
    len[0] = 0;
    int j = 0;
    for (i = 0; i < size(input); i++) { \\2 unit operations
        if (input[i] != value && len[j] == 0) \\followed relatively often, 3-4 unit operations (1-2 unit operations on this step)
            continue; \\5 unit operations
        else if (input[1] == value) \\also followed relatively often, 4 unit operations (on '&&' if first is false, second is not checked)
            len[j]++; \\5 unit operations
        else /*if (input[i] != value && len[j] != 0)*/ { \\4 unit operations (commented operation is not needed (only remaining possible outcome)
            j++; \\5 unit operations
            len[j] = 0; \\7 unit operations (access and substitution)
            continue; \\8 unit operations
        }
    }
}

As you can see there are at most 8 unit operations, and at best 4-5. Worst case scenario is that there are n/2 paths that go through 8 operations, but most of the cases I suppose it would follow one of the 5 step path.

Well, perhaps rle and those other functions are better optimized, but question is, is it optimized for your problem? I recommend checking.

Upvotes: 0

jalapic
jalapic

Reputation: 14192

you can use rle

rle(x)$lengths[which(rle(x)$values==0)]

# 1 1 2 3

for speed, you could only run rle once:

x1 <- rle(x)
x1$lengths[which(x1$values==0)]

Upvotes: 2

Related Questions