Reputation: 5082
I have a list of Item objects that have a date attribute. I also have a a single date I am grabbing from the database.
I want to search through the list, find all of the list items that are greater than the date I have returned from the database.
My list of Items objects has over a thousand objects in it, so I want to be as efficient as possible.
I assume that looping over every item in my list and checking if it is greater than the date I returned from the db is not the most efficient way to do this.
class Item(object):
def __init__(self, title, link, description, date):
self.title = title
self.link = link
self.description = description
self.date = date
item_list = []
...
#assume I populate the list with a 1,000 Item objects
filtered_list = []
for it in item_list:
if date_from_db > it.date:
filtered_list.append(it)
Upvotes: 2
Views: 3899
Reputation: 52313
Sort the list then bisect it. This works the same if you're looking for one specific item and will also solve the puzzle of finding an object based on it's attributes.
# add __eq__ and __lt__ functions to your object so they'll sort by date
class Item(object):
"""blah blah as above"""
def __eq__(self, other):
return self.date == other.date
def __lt__(self, other):
"""Order by date ascending"""
return self.date < other.date
# sort that puppy
item_list = sorted(item_list)
# mock up an item to faux search for
item_from_db = Item(None, None, None, date_from_db)
# find the index of it's rightful place
idx = bisect.bisect(item_list, item_from_db)
# aberacadabera
return item_list[idx:]
Implementing an append
routine for filtered_list
which uses bisect.insort
, rather than sorting the whole list in one hit, seems likely to offer performance gains as well. Not tested exactly as posted, but should be enough to get you there.
Upvotes: -1
Reputation: 86778
The only way to avoid looping over every item in your list is to sort it by date and then just searching it backwards until you find the last item that's greater than your target date, adding them to your filtered_list
as you go.
Or sort your list descending and search forwards until you find the first last item that's greater than your target. This would let you easily modify your loop like this:
filtered_list = []
for it in item_list:
if date_from_db > it.date:
filtered_list.append(it)
else:
break
Alternatively, if you expect more than a few items in your filtered list, it may be faster to do a binary search to find the first item that meets your criteria and use a list slice to copy it to filtered_list
:
first = binary_search(item_list, lambda it: cmp(date_from_db, it.date))
if first == -1:
return []
return item_list[first:]
And here's a binary search function I adapted from the link in the last paragraph. I believe it should work:
def binary_search(a, comp, lo=0, hi=None):
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
cmpval = comp(a[mid])
if cmpval < 0:
lo = mid+1
elif cmpval > 0:
hi = mid
else:
return mid
return -1
Upvotes: 2
Reputation: 71054
In response to a claim that this list comp is confusing, I'll post a way to format it that makes it clear. I've been using this a lot recently.
filtered_list = [item # What we're collecting
for item in item_list # What we're collecting it over
if date_from_db < item.date] # the conditions
It does turn what could be a one liner into a three liner like it would be with a regular for loop but in cases much worse than this (and even here) it improves readability and lets you have the improved efficiency.
Upvotes: 1
Reputation: 19478
You can use filter:
filtered_list = filter(lambda(item): date_from_db < item.date, item_list)
Or you can use for comprehension:
filtered_list = [item for item in item_list if date_from_db < item.date]
I believe that people prefer the latter more often, but I like the former. Lambda is just an inline function - you can make that function explicit if you like.
Upvotes: 0
Reputation: 18385
List comprehensions are a fairly efficient way to do this outside of a database:
[it for it in item_list if date_from_db > it.date]
Otherwise, you could use the filter
builtin:
filter(lambda it: it if date_from_db > it.date, item_list)
Upvotes: 5