Alex Pharaon
Alex Pharaon

Reputation: 131

Heuristic to choose five column arrays that maximise the dot product

I have a sparse 60000x10000 matrix M where each element is either a 1 or 0. Each column in the matrix is a different combination of signals (ie. 1s and 0s). I want to choose five column vectors from M and take the Hadamard (ie. element-wise) product of them; I call the resulting vector the strategy vector. After this step, I compute the dot product of this strategy vector with a target vector (that does not change). The target vector is filled with 1s and -1s such that having a 1 in a specific row of the strategy vector is either rewarded or penalised.

Is there some heuristic or linear algebra method that I could use to help me pick the five vectors from the matrix M that result in a high dot product? I don't have any experience with Google's OR tools nor Scipy's optimization methods so I am not too sure if they can be applied to my problem. Advice on this would be much appreciated! :)

Note: the five column vectors given as the solution does not need to be the optimal one; I'd rather have something that does not take months/years to run.

Upvotes: 4

Views: 440

Answers (2)

Morton
Morton

Reputation: 159

First of all, thanks for a good question. I don't get to practice numpy that often. Also, I don't have much experience in posting to SE, so any feedback, code critique, and opinions relating to the answer are welcome.

This was an attempt at finding an optimal solution at first, but I didn't manage to deal with the complexity. The algorithm should, however, give you a greedy solution that might prove to be adequate.

Colab Notebook (Python code + Octave validation)

Core Idea

Note: During runtime, I've transposed the matrix. So, the column vectors in the question correspond to row vectors in the algorithm.

Notice that you can multiply the target with one vector at a time, effectively getting a new target, but with some 0s in it. These will never change, so you can filter out some computations by removing those rows (columns, in the algorithm) in further computations entirely - both from the target and the matrix. - you're then left with a valid target again (only 1s and -1 in it).

That's the basic idea of the algorithm. Given:

  • n: number of vectors you need to pick
  • b: number of best vectors to check
  • m: complexity of matrix operations to check one vector

Do an exponentially-complex O((n*m)^b) depth-first search, but decrease the complexity of the calculations in deeper layers by reducing target/matrix size, while cutting down a few search paths with some heuristics.

Heuristics used

  1. The best score achieved so far is known in every recursion step. Compute an optimistic vector (turn -1 to 0) and check what scores can still be achieved. Do not search in levels where the score cannot be surpassed.

    This is useless if the best vectors in the matrix have 1s and 0s equally distributed. The optimistic scores are just too high. However, it gets better with more sparsity.

  2. Ignore duplicates. Basically, do not check duplicate vectors in the same layer. Because we reduce the matrix size, the chance for ending up with duplicates increases in deeper recursion levels.

Further Thoughts on Heuristics The most valuable ones are those that eliminate the vector choices at the start. There's probably a way to find vectors that are worse-or-equal than others, with respect to their affects on the target. Say, if v1 only differs from v2 by an extra 1, and target has a -1 in that row, then v1 is worse-or-equal than v2.

The problem is that we need to find more than 1 vector, and can't readily discard the rest. If we have 10 vectors, each worse-or-equal than the one before, we still have to keep 5 at the start (in case they're still the best option), then 4 in the next recursion level, 3 in the following, etc.

Maybe it's possible to produce a tree and pass it on in into recursion? Still, that doesn't help trim down the search space at the start... Maybe it would help to only consider 1 or 2 of the vectors in the worse-or-equal chain? That would explore more diverse solutions, but doesn't guarantee that it's more optimal.

Warning: Note that the MATRIX and TARGET in the example are in int8. If you use these for the dot product, it will overflow. Though I think all operations in the algorithm are creating new variables, so are not affected.

Code

# Given:
TARGET = np.random.choice([1, -1], size=60000).astype(np.int8)
MATRIX = np.random.randint(0, 2, size=(10000,60000), dtype=np.int8)

# Tunable - increase to search more vectors, at the cost of time.
# Performs better if the best vectors in the matrix are sparse
MAX_BRANCHES = 3   # can give more for sparser matrices

# Usage
score, picked_vectors_idx = pick_vectors(TARGET, MATRIX, 5)

# Function
def pick_vectors(init_target, init_matrix, vectors_left_to_pick: int, best_prev_result=float("-inf")):
    assert vectors_left_to_pick >= 1
    if init_target.shape == (0, ) or len(init_matrix.shape) <= 1 or init_matrix.shape[0] == 0 or init_matrix.shape[1] == 0:
        return float("inf"), None
    target = init_target.copy()
    matrix = init_matrix.copy()

    neg_matrix = np.multiply(target, matrix)
    neg_matrix_sum = neg_matrix.sum(axis=1)

    if vectors_left_to_pick == 1:
        picked_id = np.argmax(neg_matrix_sum)
        score = neg_matrix[picked_id].sum()
        return score, [picked_id]

    else:
        sort_order = np.argsort(neg_matrix_sum)[::-1]
        sorted_sums = neg_matrix_sum[sort_order]
        sorted_neg_matrix = neg_matrix[sort_order]
        sorted_matrix = matrix[sort_order]

        best_score = best_prev_result
        best_picked_vector_idx = None

        # Heuristic 1 (H1) - optimistic target.
        # Set a maximum score that can still be achieved
        optimistic_target = target.copy()
        optimistic_target[target == -1] = 0
        if optimistic_target.sum() <= best_score:
            # This check can be removed - the scores are too high at this point
            return float("-inf"), None

        # Heuristic 2 (H2) - ignore duplicates
        vecs_tried = set()

        # MAIN GOAL:   for picked_id, picked_vector in enumerate(sorted_matrix):
        for picked_id, picked_vector in enumerate(sorted_matrix[:MAX_BRANCHES]):
            # H2
            picked_tuple = tuple(picked_vector)
            if picked_tuple in vecs_tried:
                continue
            else:
                vecs_tried.add(picked_tuple)

            # Discard picked vector
            new_matrix = np.delete(sorted_matrix, picked_id, axis=0)
            
            # Discard matrix and target rows where vector is 0
            ones = np.argwhere(picked_vector == 1).squeeze()
            new_matrix = new_matrix[:, ones]
            new_target = target[ones]
            if len(new_matrix.shape) <= 1 or new_matrix.shape[0] == 0:
                return float("-inf"), None

            # H1: Do not compute if best score cannot be improved
            new_optimistic_target = optimistic_target[ones]
            optimistic_matrix = np.multiply(new_matrix, new_optimistic_target)
            optimistic_sums = optimistic_matrix.sum(axis=1)
            optimistic_viable_vector_idx = optimistic_sums > best_score
            if optimistic_sums.max() <= best_score:
                continue
            new_matrix = new_matrix[optimistic_viable_vector_idx]
         
            score, next_picked_vector_idx = pick_vectors(new_target, new_matrix, vectors_left_to_pick - 1, best_prev_result=best_score)
            
            if score <= best_score:
                continue

            # Convert idx of trimmed-down matrix into sorted matrix IDs
            for i, returned_id in enumerate(next_picked_vector_idx):
                # H1: Loop until you hit the required number of 'True'
                values_passed = 0
                j = 0
                while True:
                    value_picked: bool = optimistic_viable_vector_idx[j]
                    if value_picked:
                        values_passed += 1
                        if values_passed-1 == returned_id:
                            next_picked_vector_idx[i] = j
                            break
                    j += 1

                # picked_vector index
                if returned_id >= picked_id:
                    next_picked_vector_idx[i] += 1

            best_score = score

            # Convert from sorted matrix to input matrix IDs before returning
            matrix_id = sort_order[picked_id]
            next_picked_vector_idx = [sort_order[x] for x in next_picked_vector_idx]
            best_picked_vector_idx = [matrix_id] + next_picked_vector_idx

        return best_score, best_picked_vector_idx

Upvotes: 2

Matt Hall
Matt Hall

Reputation: 8152

Maybe it's too naive, but the first thing that occurs to me is to choose the 5 columns with the shortest distance to the target:

import scipy
import numpy as np
from sklearn.metrics.pairwise import pairwise_distances

def sparse_prod_axis0(A):
    """Sparse equivalent of np.prod(arr, axis=0)
    From https://stackoverflow.com/a/44321026/3381305
    """
    valid_mask = A.getnnz(axis=0) == A.shape[0]
    out = np.zeros(A.shape[1], dtype=A.dtype)
    out[valid_mask] = np.prod(A[:, valid_mask].A, axis=0)
    return np.matrix(out)

def get_strategy(M, target, n=5):
    """Guess n best vectors.
    """
    dists = np.squeeze(pairwise_distances(X=M, Y=target))
    idx = np.argsort(dists)[:n]
    return sparse_prod_axis0(M[idx])

# Example data.
M = scipy.sparse.rand(m=6000, n=1000, density=0.5, format='csr').astype('bool')
target = np.atleast_2d(np.random.choice([-1, 1], size=1000))

# Try it.
strategy = get_strategy(M, target, n=5)
result = strategy @ target.T

It strikes me that you could add another step of taking the top few percent from the Mtarget distances and check their mutual distances — but this could be quite expensive.

I have not checked how this compares to an exhaustive search.

Upvotes: 0

Related Questions