pythonarraysnumpyoptimizationscikit-image

Reputation: 23

How to create a binary 3-dimensional matrix of ellipsoids with centers from a list of points in Python?

The Goal

I have an M-length array of 3-dimensional coordinate points as floats. I would like to create a 3-dimensional numpy array of a predefined shape filled with ellipsoids of a given float radius centered on these points. Because this is intended for image manipulation, I call each value in the array a "pixel". If these ellipsoids overlap, I would like to assign the pixel to the closer center by euclidean distance. The final output will be a numpy array with a background of 0 and pixels within ellipsoids numbered 1, 2,... M, as corresponding to the initial list of coordinates, similar to the output of scipy's ndimage.label(...).

Below, I have a naive approach which considers every position in the output array and compares it to every defined center, creating a binary array with a value of 1 for any pixel inside any ellipsoid. It then uses scikit-image to watershed this binary array. While this code works, it is prohibitively slow for my use, both because it considers every pixel and center pair and because it performs watershedding seperately. How can I speed up this code?

Naive Example

def define_centromeres(template_image, centers_of_mass, xradius = 4.5, yradius = 4.5, zradius = 3.5):
    """ Creates a binary N-dimensional numpy array of ellipsoids.

    :param template_image: An N-dimensional numpy array of the same shape as the output array.
    :param centers_of_mass: A list of lists of N floats defining the centers of spots.
    :param zradius: A float defining the radius in pixels in the z direction of the ellipsoids.
    :param xradius: A float defining the radius in pixels in the x direction of the ellipsoids.
    :param yradius: A float defining the radius in pixels in the y direction of the ellipsoids.
    :return: A binary N-dimensional numpy array.
    """
    out = np.full_like(template_image, 0, dtype=int)
    for idx, val in np.ndenumerate(template_image):
        z, x, y = idx
        for point in centers_of_mass:
            pz, px, py = point[0], point[1], point[2]
            if (((z - pz)/zradius)**2 + ((x - px)/xradius)**2 + ((y - py)/yradius)**2) <= 1:
                out[z, x, y] = 1
                break
    return out

Scikit-image's watershed function; it is unlikely to find a speed-up by changing the code inside this method:

def watershed_image(binary_input_image):
    bg_distance = ndi.distance_transform_edt(binary_input_image,
                                             return_distances=True,
                                             return_indices=False)
    local_maxima = peak_local_max(bg_distance, min_distance=1, labels=binary_input_image)
    bg_mask = np.zeros(bg_distance.shape, dtype=bool)
    bg_mask[tuple(local_maxima.T)] = True
    marks, _ = ndi.label(bg_mask)
    output_watershed = watershed(-bg_distance, marks, mask=binary_input_image)
    return output_watershed

Small-scale example data:

zdim, xdim, ydim = 15, 100, 100
example_shape = np.zeros((zdim,xdim,ydim))
example_points = np.random.random_sample(size=(10,3))*np.array([zdim,xdim,ydim])
center_spots_image = define_centromeres(example_shape, example_points)
watershed_spots = watershed_image(center_spots_image)

Output:

center_spots_image, max projected to 2d

watershed_spots, max projected to 2d

Note that these images are only 2d representations of the final 3d output array.

Additional Notes

The typical size of the output array will be 31x512x512, or a total of 8.1e6 values, and the typical size of the input coordinates will be 40 3-dimensional coordinate points. I would like to optimize the speed of this procedure for this scale.

I am using numpy, scipy, and scikit-image in this project, and I must stick to these and other well-maintained and -documented packages.

Apologies for mistakes in the accessibility of the above code or for a lack of clarity in my explanation. I am a research scientist with little formalized computer science training.

Upvotes: 2

Answers (3)

loopy walt

Reputation: 968

Both the algorithm and its implementation do indeed leave some room for improvement. @Eric Johnson has covered the latter, now allow me to demonstrate another 20-fold speedup on the example by using a better algorithm.

Improvements: 1.Before masking restrict to an easily computed bounding box. 2. For overlap resolution recycle the distance calculations already done for ellipsoid computation.

Code (assumes Eric's function is alreaady defined):

import numpy as np

def rasterise(template_image, centers_of_mass, xradius=4.5, yradius=4.5, zradius=3.5):
    """Creates a labeled N-dimensional numpy array of ellipsoids.

    :param template_image: An N-dimensional numpy array of the same shape as the output array.
    :param centers_of_mass: A list of lists of N floats defining the centers of spots.
    :param zradius: A float defining the radius in pixels in the z direction of the ellipsoids.
    :param xradius: A float defining the radius in pixels in the x direction of the ellipsoids.
    :param yradius: A float defining the radius in pixels in the y direction of the ellipsoids.
    :return: An N-dimensional numpy array, with label `n` for the ellipsoid at index `n-1`.
    """
    sh = template_image.shape
    out = np.zeros(sh,int)
    aux = np.zeros(sh)
    radii = np.array([zradius,xradius,yradius])
    for j,com in enumerate(centers_of_mass,1):
        bboxl = np.floor(com-radii).clip(0,None).astype(int)
        bboxh = (np.ceil(com+radii)+1).clip(None,sh).astype(int)
        roi = out[tuple(map(slice,bboxl,bboxh))]
        roiaux = aux[tuple(map(slice,bboxl,bboxh))]
        logrid = *map(np.square,np.ogrid[tuple(
            map(slice,(bboxl-com)/radii,(bboxh-com-1)/radii,1j*(bboxh-bboxl)))]),
        dst = (1-sum(logrid)).clip(0,None)
        mask = dst>roiaux
        roi[mask] = j
        np.copyto(roiaux,dst,where=mask)
    return out
        

    
    
zdim, xdim, ydim = 15, 100, 100
example_shape = np.zeros((zdim,xdim,ydim))
example_points = np.random.random_sample(size=(10,3))*np.array([zdim,xdim,ydim])
center_spots_image = define_centromeres_labeled(example_shape, example_points)
csi = rasterise(example_shape, example_points)
print("number of pixels dfferent",np.count_nonzero(csi != center_spots_image),"out of",csi.size)
from timeit import timeit
print("Eric",timeit(lambda:define_centromeres_labeled(example_shape, example_points),number=10))
print("loopy",timeit(lambda:rasterise(example_shape, example_points),number=10))

Sample run:

number of pixels dfferent 0 out of 150000
Eric 0.37984768400201574
loopy 0.019632569048553705

Caveat:

Overlap resolution is sightly different between Eric's and my code. For example:

The difference is (I think) that Eric (top panel) uses standard Euclidean metric whereas I (bottom panel) use the metric suggested by the ellipsoids mostly out of opportunism but also because it may even be the right thing to do. Switching this over would be possible but at a speed penalty.

Upvotes: 2

yut23

Reputation: 3064

The golden rule of speeding up numpy code is to vectorize wherever possible. This moves loops from Python into much faster C routines. The main slowdown here comes from looping over every element in the array with np.ndenumerate. You can do this entirely within numpy using np.indices, plus some broadcasting to make the arrays line up:

def define_centromeres_vec(template_image, centers_of_mass, xradius=4.5, yradius=4.5, zradius=3.5):
    """Creates a binary N-dimensional numpy array of ellipsoids.

    :param template_image: An N-dimensional numpy array of the same shape as the output array.
    :param centers_of_mass: A list of lists of N floats defining the centers of spots.
    :param zradius: A float defining the radius in pixels in the z direction of the ellipsoids.
    :param xradius: A float defining the radius in pixels in the x direction of the ellipsoids.
    :param yradius: A float defining the radius in pixels in the y direction of the ellipsoids.
    :return: A binary N-dimensional numpy array.
    """
    out = np.zeros_like(template_image, dtype=int)
    # indices[:, z, x, y] = [z, x, y]
    indices = np.indices(template_image.shape, dtype=float)
    radii = np.asarray([zradius, xradius, yradius], dtype=float)
    for point in centers_of_mass:
        mask = np.sum(((indices - point[:,None,None,None]) / radii[:,None,None,None]) ** 2, axis=0) <= 1
        out[mask] = 1
    return out

That gives me a speedup of around 60x on the example data: 2.83s ± 116ms to 44.4ms ± 555µs, including watershed.

We can get a bit more speed if we move the broadcasting of radii outside the loop:

    # ...
    radii_bcast = np.broadcast_to(radii[:, None, None, None], shape=indices.shape)
    for point in centers_of_mass:
        mask = np.sum(((indices - point[:,None,None,None]) / radii_bcast) ** 2, axis=0) <= 1
        out[mask] = 1
    return out

This gets another factor of 2 or so for define_centromeres, but now a lot of the total time is spent in the watershed routine (21ms ± 60µs).

Finally, we can eliminate the watershed step by categorizing the pixels in the same loop. Instead of setting out[mask] to 1, we can just store the index of the COM (plus 1). Before that, we check for any overlap with previous ellipsoids with mask & out (which will be True anywhere the current ellipsoid overlaps previous ones). For any overlapping parts, we can use scipy.spatial.KDTree to get the closest COM for each point:

from scipy.spatial import KDTree

def define_centromeres_labeled(template_image, centers_of_mass, xradius=4.5, yradius=4.5, zradius=3.5):
    """Creates a labeled N-dimensional numpy array of ellipsoids.

    :param template_image: An N-dimensional numpy array of the same shape as the output array.
    :param centers_of_mass: A list of lists of N floats defining the centers of spots.
    :param zradius: A float defining the radius in pixels in the z direction of the ellipsoids.
    :param xradius: A float defining the radius in pixels in the x direction of the ellipsoids.
    :param yradius: A float defining the radius in pixels in the y direction of the ellipsoids.
    :return: An N-dimensional numpy array, with label `n` for the ellipsoid at index `n-1`.
    """
    out = np.zeros_like(template_image, dtype=int)
    # indices[:, z, x, y] = [z, x, y]
    indices = np.indices(template_image.shape, dtype=float)
    radii = np.asarray([zradius, xradius, yradius], dtype=float)
    radii_bcast = np.broadcast_to(radii[:, None, None, None], shape=indices.shape)
    tree = KDTree(centers_of_mass)
    for i, point in enumerate(centers_of_mass, start=1):
        mask = np.sum(((indices - point[:,None,None,None]) / radii_bcast) ** 2, axis=0) <= 1
        # check for overlap
        if np.any(mask & out):
            # get the overlapping points before modifying out
            overlap_mask = mask & out.astype(bool)
            overlap_idx = np.array(np.where(overlap_mask)).T
            out[mask] = i
            # get the closest center for all overlapping points
            out[overlap_mask] = tree.query(overlap_idx)[1] + 1
        else:
            out[mask] = i
    return out

This is about twice as fast as the previous method (20.2ms ± 384µs), with the added advantage that the ellipsoids are labeled according to their index and adjacent ellipsoids don't get merged together (this is a problem with watershed).

On my machine, it runs on full-size example data in 4.38s ± 43.9ms (31x512x512, 40 ellipsoids).

Upvotes: 2

Lukas S

Reputation: 3603

Very cool project. Thank you. The following code adds an ellipsoid to an existing array. Since my method does not rely on checking every pixel I don't think the total size of the picture should matter. It will depend mostly on the amount of ellipsoid and the radii. For your example radii it takes about ~209 ms ± 8.44 ms. So if all the other radii have similar size it should take ~8.36 s for your 40 points/ellipsoids. Which sounds feasible to me.

Also I believe this could be made faster using the symmetry of the ellipsoid. If one considers planes parallel to the coordinate planes going threw the center of the ellipsoid. Those planes divide the ellipsoid into 8 congruent parts. I believe one could calculate just one and mirror it into the other 7.

def generate_constraint(radii, x=np.nan, y=np.nan, z=np.nan):
    p = np.array([x,y,z]) 
    A = np.diag(1/np.array(radii))
    i = np.where(np.isnan(p))[0][0]   
    assert np.sum(np.isnan(p)) == 1
    
    def constraint(x):
        p[i] = x[0]
        return np.linalg.norm(A@(p-center))
    return NonlinearConstraint(constraint, -np.inf, 1)

def get_range(x0, x1, center, radii, x=np.nan, y=np.nan, z=np.nan):
    """takes fix coordinates for two of x,y,z, center and radii of
    the ellipsoid and guesses for the range of the third that lie in
    the ellipsoid and returns the range"""
    nlc = generate_constraint(radii, x, y, z)
    a = minimize(lambda x: x[0], x0=x0, constraints=[nlc], tol=0.1)
    b = minimize(lambda x: -x[0], x0=x1, constraints=[nlc], tol=0.1)
    return int(np.round(a.x.item())), int(np.round(b.x.item()))


def add_ellipsoid(out, center, radii):
    x0, x1 = center[0], center[0]
    y0, y1 = center[1], center[1]
    z0, z1 = center[2], center[2]
    
    x0, x1 = get_range(x0, x1, center, radii, x=np.nan, y=center[1], z=center[2])
    for x in np.arange(x0, x1+1):
        y0,y1 = get_range(y0, y1, center, radii, x=x, z=center[2])
        for y in np.arange(y0,y1+1):
            z0,z1 = get_range(z0, z1, center, radii, x=x, y=y)
            out[x,y,z0:z1+1] = 1
            
n = 25
center = np.array([n,n,n])/2
radii = [4.5, 4.5, 3.5]

out = np.zeros((n,n,n))
add_ellipsoid(out, center, radii)

Upvotes: 2

How to create a binary 3-dimensional matrix of ellipsoids with centers from a list of points in Python?

The Goal

Naive Example

Additional Notes

Answers (3)

Related Questions