statsman
statsman

Reputation: 91

Fastest way to compute a rolling distance between high-dimensional vectors in numpy?

I have a time series of vectors: Y = [v1, v2, ..., vn]. At each time t, I want to compute the distance between vector t and the average of the vectors before t. So for example, at t=3 I want to compute the cosine distance between v3 and (v1+v2)/2.

I have a script to do it but wondering if there's any way to do this faster via numpy's convolve feature or something like that?

import numpy
from scipy.spatial.distance import cosine
np.random.seed(10)

# Generate `T` vectors of dimension `vector_dim`
# NOTE: In practice, the vector is a very large column vector! 
T = 3
vector_dim = 2
y = [np.random.rand(1, vector_dim)[0] for t in range(T)]


def moving_distance(v):
  moving_dists = []
  for t in range(len(v)):
    if t == 0: 
      pass
    else:
      # Create moving average of values up until time t
      prior_vals = v[:t]
      m_avg = np.add.reduce(prior_vals) / len(prior_vals) 
      # Now compute distance between this moving average and vector t
      moving_dists.append(cosine(m_avg, v[t]))
  return moving_dists

d = moving_distance(y)

For this dataset, it should return: [0.3337342770170698, 0.0029993196890111262]

Upvotes: 1

Views: 429

Answers (2)

norok2
norok2

Reputation: 26886

TL;DR

This is a much faster approach using NumPy (speedups above ~100x for even modest input sizes like 64x16):

import numpy as np


def cos_dist(a, b, axis=None):
    ab = np.sum(a * b, axis=axis)
    aa = np.sum(a * a, axis=axis)
    bb = np.sum(b * b, axis=axis)
    return 1 - (ab / np.sqrt(aa * bb))


def moving_dist_cumsum_np(arr, dist=cos_dist):
    return dist(np.cumsum(arr, axis=0)[:-1], arr[1:], axis=1)

which uses a custom definition of cosine distance and is much more efficient than OP's approach as it is fully vectorized.

A slightly faster and more memory efficient (O(1) instead of O(n)) approach involves using Numba-accelerated explicit looping:

import numba as nb


@nb.njit
def cos_dist_nb(a, b):
    a = a.ravel()
    b = b.ravel()
    ab = aa = bb = 0
    n = len(a)
    for i in range(n):
        ab += a[i] * b[i]
        aa += a[i] * a[i]
        bb += b[i] * b[i]
    return 1 - (ab / (aa * bb) ** 0.5)


@nb.njit
def moving_dist_nb(arr, dist=cos_dist_nb):
    n, m = arr.shape
    result = np.empty(n - 1)
    moving = np.zeros(m)
    for i in range(n - 1):
        moving += arr[i, :]
        result[i] = dist(moving, arr[i + 1, :])
    return result

Long Answer

The computation delineated in the OP can be further speed up with various optimizations.

OP's code is significantly more complex than needed.

Let us start with an adaptation that essentially just:

  • renames the main input
  • exposes the dist function
  • returns a NumPy array
  • replaces len(prior_vals) with t as it is the same value by construction
def moving_dist_OP(arr, dist=sp.spatial.distance.cosine):
    moving_dists = []
    for t in range(len(arr)):
        if t == 0:
            pass
        else:
            # Create moving average of values up until time t
            prior_vals = arr[:t]
            m_avg = np.add.reduce(prior_vals) / t 
            # Now compute distance between this moving average and vector t
            moving_dists.append(dist(m_avg, arr[t]))
    return np.array(moving_dists)

Now, this can be further simplified to this:

def moving_dist_simpler(arr, dist=sp.spatial.distance.cosine):
    return np.array([dist(np.add.reduce(arr[:t]), arr[t]) for t in range(1, len(arr))])

On the provision that:

  • the loop appending can be rewritten as a list comprehension
  • the range can be made to start from 1 rather than skipping
  • the division by the length (a non-negative number) can be factored out in the cosine distance

This last observation stems from the definition of the cosine distance for two vectors a and b of identical size, where a . b is the dot product of a and b and |a| = √(a . a) is the norm induced by said dot product:

cos_dist(a, b) = 1 - (a . b) / (|a| |b|)

if a is replaced with k * a with k > 0 (and |k| is the absolute value of k), this becomes:

     1 - ((k * a) . b) / (|k * a| |b|)
 ->  1 - (k * (a . b)) / (|k| |a| |b|)
 ->  1 - sign(k) * (a . b) / (|a| |b|)
 ->  1 - (a . b) / (|a| |b|)

The np.add.reduce() computation is not very efficient because its values at the next iteration could be computed in terms of the result from the previous iteration, but instead at each iteration an increasing number of numbers are summed up together to perform the computation. Instead, re-written with partial sums, this becomes:

def moving_dist_part(arr, dist=sp.spatial.distance.cosine):
    n, m = arr.shape
    moving_dists = []
    moving = np.zeros(m)
    for i in range(n - 1):
        moving += arr[i, :]
        moving_dists.append(dist(moving, arr[i + 1]))
    return np.array(moving_dists)

It has been already noted (in @MechanicPig's answer) that the np.add.reduce() computation can also be rewritten with np.cumsum(), which is also more efficient than np.add.reduce() and of similar efficiency as the partial sum, but it uses more temporary memory (O(n) for np.cumsum() versus O(1) for partial sums):

def moving_dist_cumsum(arr, dist=sp.spatial.distance.cosine):
    movings = np.cumsum(arr, axis=0)[:-1]
    return np.array([dist(moving, arr[i]) for i, moving in enumerate(movings, 1)])

It is beneficial to rewrite this either fully vectorized or with simpler loops to be accelerated with Numba.

For the fully vectorized version, np.cumsum() is very helpful as it provides some of the partial computation in vector form.

Unfortunately, scipy.spatial.distance.cosine() does not accept higher dimensional input.

However, based on its definition, it is relatively simple to write a vectorized version of the cosine distance:

def cos_dist(a, b, axis=None):
    ab = np.sum(a * b, axis=axis)
    aa = np.sum(a * a, axis=axis)
    bb = np.sum(b * b, axis=axis)
    return 1 - (ab / np.sqrt(aa * bb))

With this, one can define a fully vectorized approach:

def moving_dist_cumsum_np(arr, dist=cos_dist):
    return dist(np.cumsum(arr, axis=0)[:-1], arr[1:], axis=1)

Note that the new definition of the cosine distance can be used just about anywhere else scipy.spatial.distance.cosine() was used, e.g.:

def moving_dist_cumsum2(arr, dist=cos_dist):
    movings = np.cumsum(arr, axis=0)[:-1]
    return np.array([dist(moving, arr[i]) for i, moving in enumerate(movings, 1)])

However, the vectorized version still has the shortcoming of requiring a potentially large (O(n)) temporary object to store the result of np.cumsum().

Fortunately, with a little more adaptation it is possible to write a Numba-accelerated version of this (similar to moving_dist_part()) that does require only O(1) temporary memory:

import numba as nb


@nb.njit
def cos_dist_nb(a, b):
    a = a.ravel()
    b = b.ravel()
    ab = aa = bb = 0
    n = len(a)
    for i in range(n):
        ab += a[i] * b[i]
        aa += a[i] * a[i]
        bb += b[i] * b[i]
    return 1 - (ab / (aa * bb) ** 0.5)


@nb.njit
def moving_dist_nb(arr, dist=cos_dist_nb):
    n, m = arr.shape
    result = np.empty(n - 1)
    moving = np.zeros(m)
    for i in range(n - 1):
        moving += arr[i, :]
        result[i] = dist(moving, arr[i + 1, :])
    return result

The above approaches can be benchmarked and plotted with the following (where smaller inputs are tested multiple times for more stable results):

import pandas as pd
import matplotlib.pyplot as plt


def benchmark(
    funcs,
    args=None,
    kws=None,
    ii=range(4, 15),
    m=16,
    kk=1024,
    is_equal=np.allclose,
    seed=0,
    unit="ms",
    verbose=True
):
    labels = [func.__name__ for func in funcs]
    units = {"s": 0, "ms": 3, "µs": 6, "ns": 9}
    args = tuple(args) if args else ()
    kws = dict(kws) if kws else {}
    assert unit in units
    np.random.seed(seed)
    timings = {}
    for i in ii:
        n = 2 ** i
        k = 1 + i * kk // n
        if verbose:
            print(f"i={i}, n={n}, m={m}, k={k}")
        arrs = np.random.random((k, n, m))
        base = np.array([funcs[0](arr, *args, **kws) for arr in arrs])
        timings[n] = []
        for func in funcs:
            res = np.array([func(arr, *args, **kws) for arr in arrs])
            is_good = is_equal(base, res)
            timed = %timeit -n 1 -r 1 -q -o [func(arr, *args, **kws) for arr in arrs]
            timing = timed.best / k
            timings[n].append(timing if is_good else None)
            if verbose:
                print(
                    f"{func.__name__:>24}"
                    f"  {is_good!s:5}"
                    f"  {timing * (10 ** units[unit]):10.3f} {unit}"
                    f"  {timings[n][0] / timing:5.1f}x")
    return timings, labels


def plot(timings, labels, xlabel="Input Size / #", unit="ms"):
    n_rows = 1
    n_cols = 3
    fig, axs = plt.subplots(n_rows, n_cols, figsize=(8 * n_cols, 6 * n_rows), squeeze=False)
    units = {"s": 0, "ms": 3, "µs": 6, "ns": 9}
    df = pd.DataFrame(data=timings, index=labels).transpose()
    
    base = df[[labels[0]]].to_numpy()
    (df * 10 ** units[unit]).plot(marker="o", xlabel=xlabel, ylabel=f"Best timing / {unit}", ax=axs[0, 0])
    (df / base * 100).plot(marker='o', xlabel=xlabel, ylabel='Relative speed /labels %', logx=True, ax=axs[0, 1])
    (base / df).plot(marker='o', xlabel=xlabel, ylabel='Speed Gain / x', ax=axs[0, 2])

    fig.patch.set_facecolor('white')

to be used as:

funcs = moving_dist_OP, moving_dist_simpler, moving_dist_part, moving_dist_cumsum, moving_dist_cumsum2, moving_dist_cumsum_np, moving_dist_nb

timings, labels = benchmark(funcs, unit="ms", verbose=True)

plot(timings, labels, "Benchmarks", unit="ms")

to obtain:

bm

These results indicate that Numba approach is the fastest by far and large, but the vectorized approach is reasonably fast. When it comes to explicit non-accelerated looping, it is still beneficial to use the custom-defined cos_dist() in place of scipy.spatial.distance.cosine() (see moving_dist_cumsum() vs moving_dist_cumsum2()), while np.cumsum() is reasonably faster than np.add.reduce() but only marginally faster over computing the partial sum. Finally, moving_dist_OP() and moving_dist_simpler() are effectively equivalent (as expected).

Upvotes: 3

Mechanic Pig
Mechanic Pig

Reputation: 7736

ndarray.cumsum or np.add.accumulate can be used to calculate the cumulative sum:

>>> y
array([[0.77132064, 0.02075195],
       [0.63364823, 0.74880388],
       [0.49850701, 0.22479665]])
>>> y.cumsum(0)
array([[0.77132064, 0.02075195],
       [1.40496888, 0.76955583],
       [1.90347589, 0.99435248]])

Therefore, the equivalent code of the function you provide is as follows:

>>> means = y.cumsum(0)[:-1] / np.arange(1, len(y))[:, None]
>>> [cosine(avg, vec) for avg, vec in zip(means, y[1:])]
[0.3337342770170698, 0.0029993196890111262]

Referring to the implementation of cosine, the more vectorized code is as follows:

>>> y_ = y[1:]
>>> uv = (means * y_).mean(1)
>>> uu = (means ** 2).mean(1)
>>> vv = (y_ ** 2).mean(1)
>>> np.clip(np.abs(1 - uv / np.sqrt(uu * vv)), 0, 2)
array([0.33373428, 0.00299932])

Upvotes: 3

Related Questions