adam steer
adam steer

Reputation: 31

Improving a method for a spatially aware median filter for point clouds in Python

I have point cloud data from airborne LiDAR. It is noisy, so I want to run a median filter which collects points within N metres of each point, finds the median elevation value, and returns the neighbourhood median as an adjusted elevation value.

This is roughly analogous to gridding the data, and taking the median of elevations within each bin. Or scipy.signal.medfilt.

But - I want to preserve the location (x,y) of each point. Also I'm not sure that medfilt preserves the spatial information required.

I have a method, but it involves multiple for loops. Expensive when millions of points go in

Updated - for each iteration of the first loop, a small patch of points is selected for the shapely intersection operation. The first version searched all input points for an intersection at every iteration. Now, only a small patch at a time is converted to a shapely geometry and used for the intersection:

import numpy as np
from shapely import geometry

def spatialmedian(pointcloud,radius):
"""
Using shapely geometires, replace every point in a cloud with the
median value of points within 'radius' units of the point
'pointcloud' must have no more than 3 dimensions (x,y,z)
"""
    new_z = []
    i = 0

    for point in pointcloud:
        #pick a point and make it a shapely Point
        point = geometry.Point(pointcloud[i,:])

        #select a patch around the point and make it a shapely
        # MultiPoint
        patch = geometry.MultiPoint(list(pointcloud[\
                       (pointcloud[:,0] > point.x - radius+0.5) &\
                       (pointcloud[:,0] < point.x + radius+0.5) &\
                       (pointcloud[:,1] > point.y - radius+0.5) &\
                       (pointcloud[:,1] < point.y + radius+0.5)\
                       ]))      

        #buffer the Point by radius
        pbuff = point.buffer(radius)

        #use the intersection method to find points in our
        # patch that lie inside the Point buffer
        isect = pbuff.intersection(patch)
        #print(isect.geom_type)

        #initialise another list
        plist = []

        #for every intersection set,
        # unpack it into a list and collect the median
        # Z value.
        if isect.geom_type == 'MultiPoint':
            #print('point has neightbours')
            for p in isect:
                plist.append(p.z)

            new_z.append(np.median(plist)) 
        else:
            # if the intersection set isn't MultiPoint,
            # it is an isolated point, whose median Z value 
            # is it's own.
            #print('isolated point')

            #append it to the big list
            new_z.append(isect.z) 
    #iterate i  
    i += 1        

    #print(i)    
    #return a list of new median filtered Z coordinates
    return new_z

This works by:

For 10^4 points, I get a result in 11 seconds. For 10^5 points 3 minutes, and most of my datasets run into 2- 5 * 10^6 points. On a 2 * 10^6 point cloud it's been running overnight.

What I want is a faster/more efficient method!

I've been tinkering with python-pcl, which is fast for filtering point clouds, but I don't know how to return indices of points which pass/fail pcl-python filters. I need those indices because each point has other attributes which must remain attached to it.

If anyone can suggest a more efficient method, please do so - I would highly appreciate your help. If it can't go faster and this code is helpful, feel free to use it.

Thanks!

Upvotes: 1

Views: 1525

Answers (1)

adam steer
adam steer

Reputation: 31

After some good advice, I tried this:

#import numpy and scikit-learn's neighbours modulw
import numpy as np
from sklearn import neighbors as nb

#make a little time ticker
from datetime import datetime
startTime = datetime.now()

# generate a KDTree object. This takes ~95% of the
# processing time
tree = nb.KDTree(xyzi[:,0:3], leaf_size=60)

# how long did tree generation take
print(datetime.now() - startTime)

#initialise a list
new_z = []

#for each point, collect neighbours within radius r
nhoods = tree.query_radius(xyzi[:,0:3], r=3)

# iterate through the list of neighbourhoods,
# find the median height, and add it to the output list
for point in nhoods:
    new_z.append(np.median(xyzi[point,2]))

# how long did it take?
print(datetime.now() - startTime)

This version took ~33 minutes for just under two million points. Acceptable, but still could be better.

Can the KDTree generation go faster using a %jit method?

IS there a better method than looping through all the neighbourhoods to find neighbourhood means? here, nhood is an array-of-arrays - I thought something like:

median = np.median(nhoods[:][:,2])

...but it didn't.

Thanks!

Upvotes: 2

Related Questions