Optimize testing all combinations of rows from multiple NumPy arrays

Question

I have three NumPy arrays of ints, same number of columns, arbitrary number of rows each. I am interested in all instances where a row of the first one plus a row of the second one gives a row of the third one ([3, 1, 4] + [1, 5, 9] = [4, 6, 13]).

Here is a pseudo-code:

for i, j in rows(array1), rows(array2):
    if i + j is in rows(array3):
        somehow store the rows this occured at (eg. (1,2,5) if 1st row of 
        array1 + 2nd row of array2 give 5th row of array3)

I will need to run this for very big matrices so I have two questions:

(1) I can write the above using nested loops but is there a quicker way, perhaps list comprehensions or itertools?

(2) What is the fastest/most memory-efficient way to store the triples? Later I will need to create a heatmap using two as coordinates and the first one as the corresponding value eg. point (2,5) has value 1 in the pseudo-code example.

Would be very grateful for any tips - I know this sounds quite simple but it needs to run fast and I have very little experience with optimization.

edit: My ugly code was requested in comments

import numpy as np

#random arrays
A = np.array([[-1,0],[0,-1],[4,1], [-1,2]])
B = np.array([[1,2],[0,3],[3,1]])
C = np.array([[0,2],[2,3]])

#triples stored as numbers with 2 coordinates in a otherwise-zero matrix
output_matrix = np.zeros((B.shape[0], C.shape[0]), dtype = int)
for i in range(A.shape[0]):
    for j in range(B.shape[0]):
        for k in range(C.shape[0]):
            if np.array_equal((A[i,] + B[j,]), C[k,]):
                output_matrix[j, k] = i+1

print(output_matrix)

Nelewout · Accepted Answer

(1) Improvements

You can use sets for the final result in the third matrix, as a + b = c must hold identically. This already replaces one nested loop with a constant-time lookup. I will show you an example of how to do this below, but we first ought to introduce some notation.

For a set-based approach to work, we need a hashable type. Lists will thus not work, but a tuple will: it is an ordered, immutable structure. There is, however, a problem: tuple addition is defined as appending, that is,

(0, 1) + (1, 0) = (0, 1, 1, 0).

This will not do for our use-case: we need element-wise addition. As such, we subclass the built-in tuple as follows,

class AdditionTuple(tuple):

    def __add__(self, other):
        """
        Element-wise addition.
        """
        if len(self) != len(other):
            raise ValueError("Undefined behaviour!")

        return AdditionTuple(self[idx] + other[idx]
                             for idx in range(len(self)))

Where we override the default behaviour of __add__. Now that we have a data-type amenable to our problem, let's prepare the data.

You give us,

A = [[-1, 0], [0, -1], [4, 1], [-1, 2]]
B = [[1, 2], [0, 3], [3, 1]]
C = [[0, 2], [2, 3]]

To work with. I say,

from types import SimpleNamespace

A = [AdditionTuple(item) for item in A]
B = [AdditionTuple(item) for item in B]
C = {tuple(item): SimpleNamespace(idx=idx, values=[])
     for idx, item in enumerate(C)}

That is, we modify A and B to use our new data-type, and turn C into a dictionary which supports (amortised) O(1) look-up times.

We can now do the following, eliminating one loop altogether,

from itertools import product

for a, b in product(enumerate(A), enumerate(B)):
    idx_a, a_i = a
    idx_b, b_j = b

    if a_i + b_j in C:  # a_i + b_j == c_k, identically
        C[a_i + b_j].values.append((idx_a, idx_b))

Then,

>>>print(C)
{(2, 3): namespace(idx=1, values=[(3, 2)]), (0, 2): namespace(idx=0, values=[(0, 0), (1, 1)])}

Where for each value in C, you get the index of that value (as idx), and a list of tuples of (idx_a, idx_b) whose elements of A and B together sum to the value at idx in C.

Let us briefly analyse the complexity of this algorithm. Redefining the lists A, B, and C as above is linear in the length of the lists. Iterating over A and B is of course in O(|A| * |B|), and the nested condition computes the element-wise addition of the tuples: this is linear in the length of the tuples themselves, which we shall denote k. The whole algorithm then runs in O(k * |A| * |B|).

This is a substantial improvement over your current O(k * |A| * |B| * |C|) algorithm.

(2) Matrix plotting

Use a dok_matrix, a sparse SciPy matrix representation. Then you can use any heatmap-plotting library you like on the matrix, e.g. Seaborn's heatmap.

Optimize testing all combinations of rows from multiple NumPy arrays

Answers (2)

(1) Improvements

(2) Matrix plotting

Related Questions