How to find elements that match a specific pattern in a 2d list

Question

I would like to find an efficient way to retrieve all the elements in an array that match a specific pattern.

For example, considering that I have:

an array M composed of sub-arrays of different sizes:

  M = [[0, 1],
       [3, 2, 4],
       [3, 8],
       [9],
       [0, 2],
       [3, 1],
       [0, 3],
       [2, 4],
       [3, 7]]

A pattern of subarrays. For example, [[a, b], [a, c], [a, d]] matches [[0, 1], [0, 2], [0, 3]].

How can I return all the elements of M that correspond to the pattern?

So far I have been using for loops to find matching elements but this naive approach turns out to be very costly when the pattern has more than 2 sub-arrays.

Example:

M = [[0, 1], [3, 2, 4], [3, 8], [9], [0, 2], [3, 1], [0, 3], [2, 4], [3, 7]]

# pattern with 3 sub-arrays -> [[a, b], [a, c], [a, d]]

for i, arr1 in enumerate(M):
    for j, arr2 in enumerate(M):
        for k, arr3 in enumerate(M):
            if i != j != k:
                if len(arr1) == len(arr2) == len(arr3) == 2:
                    a1, a2, a3 = arr1[0], arr2[0], arr3[0]
                    b, c, d = arr1[1], arr2[1], arr3[1]
                    if a1 == a2 == a3 and b < c < d:
                        print arr1, arr2, arr3

Output:

[0,1], [0,2], [0,3]
[3,1], [3,7], [3,8]

Since each sub-array accounts for an additional nested loop, the time complexity of this method (O(n^k) where k is the number of sub-arrays), becomes an issue.

Is it possible to speed up this process? If so, how?

Mad Physicist · Accepted Answer

First, before jumping into numpy, let's take a look at your conditions. You require the sub-arrays to have only two elements. So let's pre-filter your array:

M = [m for m in M if len(m) == 2]

Now you are checking a1 == a2 == a3 and b < c < d, but each possible permutation of b, c, d shows up in the sequence. So really, if you find any b != c != d for a given a, you can rearrange it to the right order, knowing that that order will show up eventually.

A very simple way to handle this is therefore to construct a dictionary mapping a to all possible options for b, c, d, filter them for a minimum of the number of "subarrays" you want, sort them, and compute all the possible combinations:

# set removed duplicates automatically
options = collections.defaultdict(set)

for a, b in (m for m in M if len(m) == 2):  # Use a generator to filter on-the-fly
    options[a].add(b)

for a, bcd in options.items():
    # sort (combinations automatically filters too-short bins)
    for b, c, d in itertools.combinations(sorted(bcd), 3):
        print(f'[{a}, {b}], [{a}, {c}], [{a}, {d}]')

This solution is likely algorithmically optimal. It makes a single pass over the initial list to identify potential patterns, and then performs exactly one iteration per pattern. The only thing that is potentially missing here is that duplicates are eliminated entirely. You can handle duplicates by using collections.Counter instead of set.

How to find elements that match a specific pattern in a 2d list

Answers (1)

Related Questions