Reputation: 1363
I would like to find an efficient way to retrieve all the elements in an array that match a specific pattern.
For example, considering that I have:
an array M
composed of sub-arrays of different sizes:
M = [[0, 1],
[3, 2, 4],
[3, 8],
[9],
[0, 2],
[3, 1],
[0, 3],
[2, 4],
[3, 7]]
A pattern of subarrays. For example, [[a, b], [a, c], [a, d]]
matches [[0, 1], [0, 2], [0, 3]]
.
How can I return all the elements of M
that correspond to the pattern?
So far I have been using for
loops to find matching elements but this naive approach turns out to be very costly when the pattern has more than 2 sub-arrays.
Example:
M = [[0, 1], [3, 2, 4], [3, 8], [9], [0, 2], [3, 1], [0, 3], [2, 4], [3, 7]]
# pattern with 3 sub-arrays -> [[a, b], [a, c], [a, d]]
for i, arr1 in enumerate(M):
for j, arr2 in enumerate(M):
for k, arr3 in enumerate(M):
if i != j != k:
if len(arr1) == len(arr2) == len(arr3) == 2:
a1, a2, a3 = arr1[0], arr2[0], arr3[0]
b, c, d = arr1[1], arr2[1], arr3[1]
if a1 == a2 == a3 and b < c < d:
print arr1, arr2, arr3
Output:
[0,1], [0,2], [0,3]
[3,1], [3,7], [3,8]
Since each sub-array accounts for an additional nested loop, the time complexity of this method (O(n^k)
where k
is the number of sub-arrays), becomes an issue.
Is it possible to speed up this process? If so, how?
Upvotes: 1
Views: 914
Reputation: 114330
First, before jumping into numpy, let's take a look at your conditions. You require the sub-arrays to have only two elements. So let's pre-filter your array:
M = [m for m in M if len(m) == 2]
Now you are checking a1 == a2 == a3 and b < c < d
, but each possible permutation of b
, c
, d
shows up in the sequence. So really, if you find any b != c != d
for a given a
, you can rearrange it to the right order, knowing that that order will show up eventually.
A very simple way to handle this is therefore to construct a dictionary mapping a
to all possible options for b
, c
, d
, filter them for a minimum of the number of "subarrays" you want, sort them, and compute all the possible combinations:
# set removed duplicates automatically
options = collections.defaultdict(set)
for a, b in (m for m in M if len(m) == 2): # Use a generator to filter on-the-fly
options[a].add(b)
for a, bcd in options.items():
# sort (combinations automatically filters too-short bins)
for b, c, d in itertools.combinations(sorted(bcd), 3):
print(f'[{a}, {b}], [{a}, {c}], [{a}, {d}]')
This solution is likely algorithmically optimal. It makes a single pass over the initial list to identify potential patterns, and then performs exactly one iteration per pattern. The only thing that is potentially missing here is that duplicates are eliminated entirely. You can handle duplicates by using collections.Counter
instead of set
.
Upvotes: 1