Efficient Way to Repeatedly Split Large NumPy Array and Record Middle

Question

I have a large NumPy array nodes = np.arange(100_000_000) and I need to rearrange this array by:

Recording and then removing the middle value in the array
Split the array into the left half and right half
Repeat Steps 1-2 for each half
Stop when all values are exhausted

So, for a smaller input example nodes = np.arange(10), the output would be:

[5 2 8 1 4 7 9 0 3 6]

This was accomplished by naively doing:

import numpy as np

def split(node, out):
    mid = len(node) // 2
    out.append(node[mid])
    return node[:mid], node[mid+1:]


def reorder(a):
    nodes = [a.tolist()]
    out = []

    while nodes:
        tmp = []
        for node in nodes:
            for n in split(node, out):
                if n:
                    tmp.append(n)
        nodes = tmp

    return np.array(out)

if __name__ == "__main__":
    nodes = np.arange(10)
    print(reorder(nodes))

However, this is way too slow for nodes = np.arange(100_000_000) and so I am looking for a much faster solution.

J&#233;r&#244;me Richard · Accepted Answer

You can vectorize your function with Numpy by working on groups of slices.

Here is an implementation:

# Similar to [e for tmp in zip(a, b) for e in tmp] ,
# but on Numpy arrays and much faster
def interleave(a, b):
    assert len(a) == len(b)
    return np.column_stack((a, b)).reshape(len(a) * 2)

# n is the length of the input range (len(a) in your example)
def fast_reorder(n):
    if n == 0:
        return np.empty(0, dtype=np.int32)

    startSlices = np.array([0], dtype=np.int32)
    endSlices = np.array([n], dtype=np.int32)
    allMidSlices = np.empty(n, dtype=np.int32)  # Similar to "out" in your implementation
    midInsertCount = 0                               # Actual size of allMidSlices

    # Generate a bunch of middle values as long as there is valid slices to split
    while midInsertCount < n:
        # Generate the new mid/left/right slices
        midSlices = (endSlices + startSlices) // 2

        # Computing the next slices is not needed for the last step
        if midInsertCount + len(midSlices) < n:
            # Generate the nexts slices (possibly with invalid ones)
            newStartSlices = interleave(startSlices, midSlices+1)
            newEndSlices = interleave(midSlices, endSlices)

            # Discard invalid slices
            isValidSlices = newStartSlices < newEndSlices
            startSlices = newStartSlices[isValidSlices]
            endSlices = newEndSlices[isValidSlices]

        # Fast appending
        allMidSlices[midInsertCount:midInsertCount+len(midSlices)] = midSlices
        midInsertCount += len(midSlices)

    return allMidSlices[0:midInsertCount]

On my machine, this is 89 times faster than your scalar implementation with the input np.arange(100_000_000) dropping from 2min35 to 1.75s. It also consume far less memory (rougthly 3~4 times less). Note that if you want a faster code, then you probably need to use a native language like C or C++.

Efficient Way to Repeatedly Split Large NumPy Array and Record Middle

Answers (2)

Related Questions