memyself
memyself

Reputation: 12618

return index of sequence of repeating numbers in array

given an array:

array = [16 16 16 22 23 23 23 25 52 52 52]

I want return a list of indices that point to the elements of three repeating numbers. In this case that would be :

indices = find_sequence(nbr_repeats = 3)
print indices
 [0 1 2  4 5 6  8 9 10] 

what is the fastest and most elegant algorithm to use in order to implement find_sequence?

Upvotes: 0

Views: 418

Answers (4)

Attila
Attila

Reputation: 28762

Since you did not specify a language, here is a pseudocode:

find_sequence(array: array of int, nbr_repeats: int) : array of int
  retVal = emty array of int // the return'd array
  last = empty array of int  // collection of last seen same elements
  i = -1
  for each element e in array
    ++i
    if (isempty(last))
      add(last, e)   // just starting
    else if (count(last, e) >= nbr_repeats)
      add(retVal, i-nbr_repeats) // found an index
    else if (e == first(last))
      add(last, e)   // we have encountered this element before
    else
      if (count(last, e) >= nbr_repeats)
        for (j=nbr_repeats-1; j>0; --j)
          add(retVal, i-j) // catching up to i with indices
      last = [e]     // new element

    if (count(last, e) >= nbr_repeats)
      for (j=nbr_repeats-1; j>0; --j)
        add(retVal, i-j) // handle end of array properly

  return retVal

Edit: removed comment about sorting as it would mangle the original indices.

Note: you could also just keep the last element and its seen-count instead of maintaining a list of last same elements

Upvotes: 0

Simon MᶜKenzie
Simon MᶜKenzie

Reputation: 8664

This looks to me like a special case of the Boyer-Moore string search algorithm, and since any language you use will contain optimisations for string search, perhaps the most elegant answer is to treat your data as a character array (i.e. a string) and use your language's built in string search functions... Note that this only works if your numbers fit into your language's supported character set (e.g. no numbers bigger than 128 in ASCII)

Upvotes: 1

xvatar
xvatar

Reputation: 3259

Based on OP's assumption:

  1. the list is sorted
  2. the largest frequency is nbr_repeats

This might work:

def find_sequence(nbr_repeats, l):
    res = []
    current = -1
    count = 0
    idx = 0
    for i in l:
        if i == current:
            count += 1
            if count == nbr_repeats:
                for k in reversed(range(nbr_repeats)):
                    res.append(idx-k)
        else:
            current = i
            count = 1
        idx += 1
    return res

Upvotes: 1

cHao
cHao

Reputation: 86506

Simplest way i know of...keep a track of the first place you saw a number. Keep on going til you find a different number, then if the sequence is long enough, add all the numbers from the start of the sequence til just before the end.

(Of course, you'll have to check the sequence length after you're done checking elements, too. I did it by iterating one past the end and just skipping the element check on the last iteration.)

To find_repeats (input : list, minimum : integer):
    start := 0
    result := []
    for each x from 0 to (input length):
        ' "*or*" here is a short-circuit or
        ' so we don't go checking an element that doesn't exist
        if x == (input length) *or* array[x] != array[start]:
            if (x - start) >= minimum:
                append [start...(x - 1)] to result
            start := x
    return result

Upvotes: 2

Related Questions