Python/Numpy: Build 2D array without adding duplicate rows (for triangular mesh)

Question

I'm working on some code that manipulates 3D triangular meshes. Once I have imported mesh data, I need to "unify" vertices that are at the same point in space.

I've been assuming that numpy arrays would be the fastest way of storing & manipulating the data, but I can't seem to find a fast way of building a list of vertices while avoiding adding duplicate entries.

So, to test out methods, creating a 3x30000 array with 10000 unique rows:

import numpy as np
points = np.random.random((10000,3))
raw_data = np.concatenate((points,points,points))
np.random.shuffle(raw_data)

This serves as a good approximation of mesh data, with each point appearing as a facet vertex 3 times. While unifying, I need to build a list of unique vertices; if a point already is in the list a reference to it must be stored.

The best I've been able to come up with using numpy so far has been the following:

def unify(raw_data):
    # first point must be new
    unified_verts = np.zeros((1,3),dtype=np.float64)
    unified_verts[0] = raw_data[0]
    ref_list = [0]

    for i in range(1,len(raw_data)):
        point = raw_data[i]     
        index_array = np.where(np.all(point==unified_verts,axis=1))[0]

        # point not in array yet
        if len(index_array) == 0:
            point = np.expand_dims(point,0)
            unified_verts = np.concatenate((unified_verts,point))
            ref_list.append(len(unified_verts)-1)

        # point already exists
        else:
            ref_list.append(index_array[0])

    return unified_verts, ref_list

Testing using cProfile:

import cProfile
cProfile.run("unify(raw_data)")

On my machine this runs in 5.275 seconds. I've though about using Cython to speed it up, but from what I've read Cython doesn't typically run much faster than numpy methods. Any advice on ways to do this more efficiently?

unutbu · Accepted Answer

Jaime has shown a neat trick which can be used to view a 2D array as a 1D array with items that correspond to rows of the 2D array. This trick can allow you to apply numpy functions which take 1D arrays as input (such as np.unique) to higher dimensional arrays.

If the order of the rows in unified_verts does not matter (as long as the ref_list is correct with respect to unifed_verts), then you could use np.unique along with Jaime's trick like this:

def unify2(raw_data):
    dtype = np.dtype((np.void, (raw_data.shape[1] * raw_data.dtype.itemsize)))
    uniq, inv = np.unique(raw_data.view(dtype), return_inverse=True)
    uniq = uniq.view(raw_data.dtype).reshape(-1, raw_data.shape[1])
    return uniq, inv

The result is the same in the sense that the raw_data can be reconstructed from the return values of unify (or unify2):

unified, ref = unify(raw_data)
uniq, inv = unify2(raw_data)
assert np.allclose(uniq[inv], unified[ref])  # raw_data

On my machine, unified, ref = unify(raw_data) requires about 51.390s, while uniq, inv = unify2(raw_data) requires about 0.133s (~ 386x speedup).

Python/Numpy: Build 2D array without adding duplicate rows (for triangular mesh)

Answers (1)

Related Questions