Thomas Kühn
Thomas Kühn

Reputation: 9820

numpy method to join two meshgrids and their result arrays

Consider two n-dimensional, possibly overlapping, numpy meshgrids, say

m1 = (x1, y1, z1, ...)
m2 = (x2, y2, z2, ...)

Within m1 and m2 there are no duplicate coordinate tuples. Each meshgrid has a result array, which may result from different functions:

r1 = f1(m1)
r2 = f2(m2)

such that f1(m) != f2(m). Now I would like to join those two meshgrids and their result arrays, e.g. m=m1&m2 and r=r1&r2 (where & would denote some kind of union), such that the coordinate tuples in m are still sorted and the values in r still correspond to the original coordinate tuples. Newly created coordinate tuples should be identifiable (for instance with a special value).

To elaborate on what I'm after, I have two examples that kind of do what I want with simple for and if statements. Here's a 1D example:

x1 = [1, 5, 7]
r1 = [i**2 for i in x1]

x2 = [2, 4, 6]
r2 = [i*3 for i in x2]

x,r = list(zip(*sorted([(i,j) for i,j in zip(x1+x2,r1+r2)],key=lambda x: x[0])))

which gives

x = (1, 2, 4, 5, 6, 7)
r = (1, 6, 12, 25, 18, 49)

For 2D it starts getting quite complicated:

import numpy as np
a1 = [1, 5, 7]
b1 = [2, 5, 6]

x1,y1 = np.meshgrid(a1,b1)
r1 = x1*y1

a2 = [2, 4, 6]
b2 = [1, 3, 8]

x2, y2 = np.meshgrid(a2,b2)
r2 = 2*x2

a = [1, 2, 4, 5, 6, 7]
b = [1, 2, 3, 5, 6, 8]

x,y = np.meshgrid(a,b)

r = np.ones(x.shape)*-1

for i in range(x.shape[0]):
    for j in range(x.shape[1]):
        if   x[i,j] in a1 and y[i,j] in b1:
            r[i,j] = r1[a1.index(x[i,j]),b1.index(y[i,j])]

        elif x[i,j] in a2 and y[i,j] in b2:
            r[i,j] = r2[a2.index(x[i,j]),b2.index(y[i,j])]

This gives the desired result, with new coordinate pairs having the value -1:

x=
[[1 2 4 5 6 7]
 [1 2 4 5 6 7]
 [1 2 4 5 6 7]
 [1 2 4 5 6 7]
 [1 2 4 5 6 7]
 [1 2 4 5 6 7]]
y=
[[1 1 1 1 1 1]
 [2 2 2 2 2 2]
 [3 3 3 3 3 3]
 [5 5 5 5 5 5]
 [6 6 6 6 6 6]
 [8 8 8 8 8 8]]
r=
[[ -1.   4.   4.  -1.   4.  -1.]
 [  2.  -1.  -1.   5.  -1.   6.]
 [ -1.   8.   8.  -1.   8.  -1.]
 [ 10.  -1.  -1.  25.  -1.  30.]
 [ 14.  -1.  -1.  35.  -1.  42.]
 [ -1.  12.  12.  -1.  12.  -1.]]

but this will also become slow quickly with increasing dimensions and array sizes. So here finally the question: How can this be done using only numpy functions. If it is not possible, what would be the fastest way to implement this in python. If it is anyhow relevant, I prefer using Python 3. Note that the functions I use in the examples are not the actual functions I use.

Upvotes: 3

Views: 3806

Answers (2)

Thomas Kühn
Thomas Kühn

Reputation: 9820

Divakar's answer is exactly what I needed. I wanted, however, to still try out the second suggestion in that answer and on top I did some profiling. I thought the results may be interesting to others. Here is the code I used for profiling:

import numpy as np
import timeit
import random

def for_join_2d(x1,y1,r1, x2,y2,r2):
    """
    The algorithm from the question.
    """

    a = sorted(list(x1[0,:])+list(x2[0,:]))
    b = sorted(list(y1[:,0])+list(y2[:,0]))

    x,y = np.meshgrid(a,b)
    r = np.ones(x.shape)*-1

    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            if   x[i,j] in a1 and y[i,j] in b1:
                r[i,j] = r1[a1.index(x[i,j]),b1.index(y[i,j])]

            elif x[i,j] in a2 and y[i,j] in b2:
                r[i,j] = r2[a2.index(x[i,j]),b2.index(y[i,j])]
    return x,y,r


def mask_join_2d(x1,y1,r1,x2,y2,r2):
    """
    Divakar's original answer.
    """
    a = np.sort(np.concatenate((x1[0,:],x2[0,:])))
    b = np.sort(np.concatenate((y1[:,0],y2[:,0])))

    # Initialize o/p array
    x,y = np.meshgrid(a,b)
    r_out = np.full([len(a), len(b)],-1)           

    # Assign for the IF part
    mask_a1 = np.in1d(a,a1)
    mask_b1 = np.in1d(b,b1)
    r_out[np.ix_(mask_b1, mask_a1)] = r1.T

    # Assign for the ELIF part
    mask_a2 = np.in1d(a,a2)
    mask_b2 = np.in1d(b,b2)
    r_out[np.ix_(mask_b2, mask_a2)] = r2.T

    return x,y,r_out


def searchsort_join_2d(x1,y1,r1,x2,y2,r2):
    """
    Divakar's second suggested solution using searchsort.
    """

    a = np.sort(np.concatenate((x1[0,:],x2[0,:])))
    b = np.sort(np.concatenate((y1[:,0],y2[:,0])))

    # Initialize o/p array
    x,y = np.meshgrid(a,b)
    r_out = np.full([len(a), len(b)],-1)           

    #the IF part
    ind_a1 = np.searchsorted(a,a1)
    ind_b1 = np.searchsorted(b,b1)
    r_out[np.ix_(ind_b1,ind_a1)] = r1.T

    #the ELIF part
    ind_a2 = np.searchsorted(a,a2)
    ind_b2 = np.searchsorted(b,b2)
    r_out[np.ix_(ind_b2,ind_a2)] = r2.T

    return x,y,r_out

##the profiling code:
if __name__ == '__main__':

    N1 = 100
    N2 = 100

    coords_a = [i for i in range(N1)]
    coords_b = [i*2 for i in range(N2)]

    a1 = random.sample(coords_a, N1//2)
    b1 = random.sample(coords_b, N2//2)

    a2 = [i for i in coords_a if i not in a1]
    b2 = [i for i in coords_b if i not in b1]

    x1,y1 = np.meshgrid(a1,b1)
    r1 = x1*y1
    x2,y2 = np.meshgrid(a2,b2)
    r2 = 2*x2

    print("original for loop")
    print(min(timeit.Timer(
        'for_join_2d(x1,y1,r1,x2,y2,r2)',
        setup = 'from __main__ import for_join_2d,x1,y1,r1,x2,y2,r2',
    ).repeat(7,1000)))

    print("with masks")
    print(min(timeit.Timer(
        'mask_join_2d(x1,y1,r1,x2,y2,r2)',
        setup = 'from __main__ import mask_join_2d,x1,y1,r1,x2,y2,r2',
    ).repeat(7,1000)))

    print("with searchsort")
    print(min(timeit.Timer(
        'searchsort_join_2d(x1,y1,r1,x2,y2,r2)',
        setup = 'from __main__ import searchsort_join_2d,x1,y1,r1,x2,y2,r2',
    ).repeat(7,1000)))

For each function I used 7 sets of 1000 iterations and picked the fastest set for evaluation. The results for two 10x10 arrays was:

original for loop
0.5114614190533757

with masks
0.21544912096578628

with searchsort
0.12026709201745689

and for two 100x100 arrays it was:

original for loop
247.88183582702186

with masks
0.5245905339252204

with searchsort
0.2439237720100209

For big matrices the use of numpy functionality unsurprisingly makes a huge difference and indeed searchsort and indexing instead of masking about halves the run time.

Upvotes: 1

Divakar
Divakar

Reputation: 221604

We can make use of some masking to replace the A in B parts to give us 1D masks. Then, we can use those masks with np.ix_ to extend to desired number of dimensions.

Thus, for a 2D case, it would be something along these lines -

# Initialize o/p array
r_out = np.full([len(a), len(b)],-1)           

# Assign for the IF part
mask_a1 = np.in1d(a,a1)
mask_b1 = np.in1d(b,b1)
r_out[np.ix_(mask_b1, mask_a1)] = r1.T

# Assign for the ELIF part
mask_a2 = np.in1d(a,a2)
mask_b2 = np.in1d(b,b2)
r_out[np.ix_(mask_b2, mask_a2)] = r2.T

a could be created, like so -

a = np.concatenate((a1,a2))
a.sort()

Similarly, for b.

Also, we could make use of indices instead of masks for use with np.ix_. For the same, we could use np.searchsorted. Thus, instead of the mask np.in1d(a,a1), we could get corresponding indices with np.searchsorted(a,a1) and so on for the rest of the masks. This should be considerably faster.


For a 3D case, I would assume that we would have another array, say c. Thus, the initialization part would involve using len(c). There would be one more mask/index-array corresponding to c and hence one more term into np.ix_ and there would be transpose of r1 and r2.

Upvotes: 2

Related Questions