imns
imns

Reputation: 5082

Search a list of objects in Python

I have a list of Item objects that have a date attribute. I also have a a single date I am grabbing from the database.

I want to search through the list, find all of the list items that are greater than the date I have returned from the database.

My list of Items objects has over a thousand objects in it, so I want to be as efficient as possible.

I assume that looping over every item in my list and checking if it is greater than the date I returned from the db is not the most efficient way to do this.

class Item(object):    
    def __init__(self, title, link, description, date):
        self.title = title
        self.link = link
        self.description = description
        self.date = date

item_list = [] 
...
#assume I populate the list with a 1,000 Item objects

filtered_list = []
for it in item_list:
    if date_from_db > it.date:
        filtered_list.append(it)

Upvotes: 2

Views: 3899

Answers (5)

John Mee
John Mee

Reputation: 52313

Sort the list then bisect it. This works the same if you're looking for one specific item and will also solve the puzzle of finding an object based on it's attributes.

# add __eq__ and __lt__ functions to your object so they'll sort by date
class Item(object):
    """blah blah as above"""
    def __eq__(self, other):            
        return self.date == other.date

    def __lt__(self, other):
        """Order by date ascending"""
        return self.date < other.date

# sort that puppy
item_list = sorted(item_list)

# mock up an item to faux search for
item_from_db = Item(None, None, None, date_from_db)

# find the index of it's rightful place
idx = bisect.bisect(item_list, item_from_db)

# aberacadabera
return item_list[idx:]

Implementing an append routine for filtered_list which uses bisect.insort, rather than sorting the whole list in one hit, seems likely to offer performance gains as well. Not tested exactly as posted, but should be enough to get you there.

Upvotes: -1

Gabe
Gabe

Reputation: 86778

The only way to avoid looping over every item in your list is to sort it by date and then just searching it backwards until you find the last item that's greater than your target date, adding them to your filtered_list as you go.

Or sort your list descending and search forwards until you find the first last item that's greater than your target. This would let you easily modify your loop like this:

filtered_list = [] 
for it in item_list: 
    if date_from_db > it.date: 
        filtered_list.append(it) 
    else:
        break

Alternatively, if you expect more than a few items in your filtered list, it may be faster to do a binary search to find the first item that meets your criteria and use a list slice to copy it to filtered_list:

first = binary_search(item_list, lambda it: cmp(date_from_db, it.date))
if first == -1:
    return []
return item_list[first:]

And here's a binary search function I adapted from the link in the last paragraph. I believe it should work:

def binary_search(a, comp, lo=0, hi=None): 
    if hi is None: 
        hi = len(a) 
    while lo < hi: 
        mid = (lo+hi)//2 
        cmpval = comp(a[mid])
        if cmpval < 0:
            lo = mid+1 
        elif cmpval > 0:
            hi = mid 
        else: 
            return mid 
    return -1 

Upvotes: 2

aaronasterling
aaronasterling

Reputation: 71054

In response to a claim that this list comp is confusing, I'll post a way to format it that makes it clear. I've been using this a lot recently.

filtered_list = [item                         # What we're collecting
                 for item in item_list        # What we're collecting it over 
                 if date_from_db < item.date] # the conditions

It does turn what could be a one liner into a three liner like it would be with a regular for loop but in cases much worse than this (and even here) it improves readability and lets you have the improved efficiency.

Upvotes: 1

Michael Neale
Michael Neale

Reputation: 19478

You can use filter:

filtered_list = filter(lambda(item): date_from_db < item.date, item_list)

Or you can use for comprehension:

filtered_list = [item for item in item_list if date_from_db < item.date]

I believe that people prefer the latter more often, but I like the former. Lambda is just an inline function - you can make that function explicit if you like.

Upvotes: 0

Tim McNamara
Tim McNamara

Reputation: 18385

List comprehensions are a fairly efficient way to do this outside of a database:

[it for it in item_list if date_from_db > it.date]

Otherwise, you could use the filter builtin:

filter(lambda it: it if date_from_db > it.date, item_list)

Upvotes: 5

Related Questions