DsCpp
DsCpp

Reputation: 2489

Find the number of occurrences of a sequence

I'm looking for an efficient way (maybe numpy?) to count the number of occurrences of a sequence of numbers in a 2D array.

e.g

count_seq_occ([2,3],
          array([[ 2,  3 ,  5,  2,  3],
                [  5,  2,  3],
                [  1]]))

Will output the result 3. The Three way nested loop option is clear, but maybe a better approach exists?
Thanks

Upvotes: 1

Views: 1528

Answers (2)

jpp
jpp

Reputation: 164693

This is one way, but I hope there is a better solution. It would be helpful if you showed us your nested loop and provided some data for benchmarking.

from itertools import chain

x = [2, 3]

A = np.array([[  2,  3,  5,  2,  3],
              [  5,  2,  3],
              [  1]])

arr = list(chain.from_iterable(A))
res = sum(arr[i:i+len(x)] == x for i in range(len(arr)))  # 3

Upvotes: 1

Shemetz
Shemetz

Reputation: 109

EDITED

KMP search

Try using this code and editing it to search in every vector of the matrix: http://code.activestate.com/recipes/117214/

It's a KMP (Knuth-Morris-Pratt) python function for finding a pattern in a text or list. You can slightly optimize it by creating the search pattern's shifts array once, then running the rest of the algorithm on every 1D sub-array.

Alternative

How about converting the array into a string representation and counting occurrences in the string?

repr(your_array).count("2, 3")

Notice: you should really format the representation or counted substring to both match the same style. For example, sometimes a repr() of a numpy array would return something like this inside: "1., 2., 3., " and you might want to fix this somehow.

Alternatively you can flatten the array and join all rows into a string, but be careful and add an extra unique character after every row.

The method could vary a bit regarding how you turn it into a string, but it should be fast enough. Searching for substrings in a string is O(n) time so you shouldn't worry about that. The only possible reason not to use this method would be if you don't want to allocate the temporary string object if the array is very large.

Upvotes: 2

Related Questions