Reputation: 2074
I have a line in my code that currently does this at each step x
:
myList = [(lo,hi) for lo,hi in myList if lo <= x <= hi]
This is pretty slow. Is there a more efficient way to eliminate things from a list that don't contain a given x
?
Upvotes: 2
Views: 108
Reputation:
Here I'm going to suggest what may seem like a really dumb solution favoring micro-optimizations over algorithmic ones. It'll depend on your specific needs.
The ultimate question is this: is a single linear pass over your array (list in Python), on average, expensive? In other words, is searching for lo/high pairs that contain x
going to generally yield results that are very small (ex: 1% of the overall size of the list), or relatively quite large (ex: 25% or more of the original list)?
If the answer is the latter, you might actually get a more efficient solution keeping a basic, contiguous, cache-friendly representation that you're accessing sequentially. The hardware cache excels at plowing through contiguous data where multiple adjacent elements fit into a cache line sequentially.
What you want to avoid in such a case is the expensive linear-time removal from the middle of the array as well as possibly the construction of a new one. If you trigger a linear-time operation for every single individual element you remove from the array, then naturally that's going to get very expensive very quickly.
To exchange that linear-time operation for a much faster constant-time one, all we have to do when we want to remove an element at a certain index in the array is to overwrite the element at that index with the element at the back of the array (last element). Now simply remove the redundant duplicate from the back of the array (a removal from the back of an array is a constant-time operation, and often involves just basic arithmetical instructions).
If your needs fit the criteria, then this can actually give you better results than a smarter algorithm. It's one of the peculiar cases where the practice can trump the theory due to the skewed performance of the hardware cache over DRAM, but if you're performing these types of hi/lo queries repeatedly and wanting to get very narrow results, then something smarter like an interval tree or at least sorting the data to allow binary searches can be considerably better.
Upvotes: 0
Reputation: 344
While you don't give much context, I'll assume the rest of the loop looks like:
for x in xlist:
myList = [(lo,hi) for lo,hi in myList if lo <= x <= hi]
In this case, if may be more efficient to construct an interval tree (http://en.wikipedia.org/wiki/Interval_tree) first. Then, for each x
you walk the tree and find all intervals which intersect with x
; add these intervals to a set as you find them.
Upvotes: 0
Reputation: 129487
Perhaps you're looking for an interval tree. From Wikipedia:
In computer science, an interval tree is an ordered tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point.
So, instead of storing the (lo, hi)
pairs sequentially in a list, you would have them define the intervals in an interval tree. Then you could perform queries on the tree with x
, and retain only the intervals that overlap x
.
Upvotes: 3