Sliding window on list of lists in Python

Question

I'm trying to use numpy/pandas to constuct a sliding window style comparator. I've got list of lists each of which is a different length. I want to compare each list to to another list as depicted below:

lists = [[10,15,5],[5,10],[5]]

window_diff(l[1],l[0]) = 25

The window diff for lists[0] and lists[1] would give 25 using the following window sliding technique shown in the image below. Because lists[1] is the shorter path we shift it once to the right, resulting in 2 windows of comparison. If you sum the last row in the image below we get the total difference between the two lists using the two windows of comparison; in this case a total of 25. To note we are taking the absolute difference.

The function should aggregate the total window_diff between each list and the other lists, so in this case

tot = total_diffs(lists)
tot>>[40, 30, 20]

# where tot[0] represents the sum of lists[0] window_diff with all other lists.

I wanted to know if there was a quick route to doing this in pandas or numpy. Currently I am using a very long winded process of for looping through each of the lists and then comparing bitwise by shifting the shorter list in accordance to the longer list.

My approach works fine for short lists, but my dataset is 10,000 lists long and some of these lists contain 60 or so datapoints, so speed is a criteria here. I was wondering if numpy, pandas had some advice on this? Thanks

Sample problem data

from random import randint
lists = [[random.randint(0,1000) for r in range(random.randint(0,60))] for x in range(100000)]

Divakar · Accepted Answer

Steps :

Among each pair of lists from the input list of lists create sliding windows for the bigger array and then get the absolute difference against the smaller one in that pair. We can use NumPy strides to get those sliding windows.
Get the total sum and store this summation as a pair-wise differentiation.
Finally sum along each row and col on the 2D array from previous step and their summation is final output.

Thus, the implementation would look something like this -

import itertools

def strided_app(a, L, S=1 ):  # Window len = L, Stride len/stepsize = S
    a = np.asarray(a)
    nrows = ((a.size-L)//S)+1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

N = len(lists)
pair_diff_sums = np.zeros((N,N),dtype=type(lists[0][0]))
for i, j in itertools.combinations(range(N), 2):
    A, B = lists[i], lists[j]
    if len(A)>len(B):
        pair_diff_sums[i,j] = np.abs(strided_app(A,L=len(B)) - B).sum()
    else:
        pair_diff_sums[i,j] = np.abs(strided_app(B,L=len(A)) - A).sum()

out = pair_diff_sums.sum(1) + pair_diff_sums.sum(0)

For really heavy datasets, here's one method using one more level of looping -

N = len(lists)
out = np.zeros((N),dtype=type(lists[0][0]))
for k,i in enumerate(lists):
    for j in lists:
        if len(i)>len(j):
            out[k] += np.abs(strided_app(i,L=len(j)) - j).sum()
        else:
            out[k] += np.abs(strided_app(j,L=len(i)) - i).sum()

strided_app is inspired from here.

Sample input, output -

In [77]: lists
Out[77]: [[10, 15, 5], [5, 10], [5]]

In [78]: pair_diff_sums
Out[78]: 
array([[ 0, 25, 15],
       [25,  0,  5],
       [15,  5,  0]])

In [79]: out
Out[79]: array([40, 30, 20])

Sliding window on list of lists in Python

Sample problem data

Answers (2)

Related Questions