Gianni Spear
Gianni Spear

Reputation: 7924

Python: How to improve performance of a script to obtain the Bounding Box of a set of points

I have a set of points (x and y) and i wish to know the maximum and Minimum values of X and Y (the Bounding Box). I coded these lines where i read all points with list comprehension and after i use max and min on X and Y. In the end i delete the points.

This solution is not memory efficency because i need to read all points

points = [(p.x,p.y) for p in lasfile.File(inFile,None,'r')] # read in list comprehension
X_Max = max(zip(*points)[0])
X_Min = min(zip(*points)[0])
Y_Max = max(zip(*points)[1])
Y_Min = min(zip(*points)[1])
del points

I am asking a suggetion to avoid this step (store all points in the memory). Thanks in advance Gianni

Upvotes: 3

Views: 173

Answers (2)

Steve Mayne
Steve Mayne

Reputation: 22818

X_Max = float('-inf')
X_Min = float('+inf')
Y_Max = float('-inf')
Y_Min = float('+inf')

for p in lasfile.File(inFile,None,'r'):
    X_Max = max(X_Max, p.x)
    X_Min = min(X_Min, p.x)
    Y_Max = max(Y_Max, p.y)
    Y_Min = min(Y_Min, p.y)

This way you only loop once over your file, as well as avoiding having more than one point in memory at a time.

EDIT File() is providing a iterator, which is only reading one line at a time from the file and supplying it to the loop variable p, as and when required.

In your question you used square brackets around your initial points assignment. This is a list comprehension which, as the name suggests, creates a list - so all points are held in memory from that point on. If you used parentheses instead like this:

points = ((p.x,p.y) for p in lasfile.File(inFile,None,'r'))

X_Max = float('-inf')
X_Min = float('+inf')
Y_Max = float('-inf')
Y_Min = float('+inf')

for p in points:
    X_Max = max(X_Max, p.x)
    X_Min = min(X_Min, p.x)
    Y_Max = max(Y_Max, p.y)
    Y_Min = min(Y_Min, p.y)

...then Python doesn't create a list, but a generator/iterator - which would return one point at a time until the file is exhausted. This would avoid having all points in memory at the same time - but can only be iterated over once.

For the sake of simplicity though, I've dropped the creation of an additional iterator in favour of just using the lasfile.File() one directly.

Upvotes: 5

halex
halex

Reputation: 16403

You could use a generator expression for points and use the key argument for max and min:

from itertools import tee
points = ((p.x,p.y) for p in lasfile.File(inFile,None,'r'))
points = tee(points, 4)

X_Max = max(points[0], key=lambda x:x[0])[0]
X_Min = min(points[1], key=lambda x:x[0])[0]
Y_Max = max(points[2], key=lambda x:x[1])[1]
Y_Min = min(points[3], key=lambda x:x[1])[1]

Update:

I added a call to itertools.tee to duplicate the original generator.

As noted in the comments, this solution has the downside that you have to (unnecessaryly) iterate over your file 4 times. Calculating the maxes and mins on every iteration, as @SteveMayne does, saves you from this.

Upvotes: 3

Related Questions