Reputation: 809
I've a doubt about lambda expression with internal list comprehension operation.
In the following code, the lambda will instantiate a list for each item every time?
def _find_items_not_present_in_store(self, store_today, store_yesterday):
# finding what items are not in the store anymore
items_not_in_store_anymore = filter(lambda item1: item1.item_id not in
[item2.item_id for item2 in store_today.store_items],
store_yesterday)
return items_not_in_store_anymore
Would be better have this list
[item2.item_id for item2 in store.store_items]
instantiated outside of the lambda expression?
I couldn't find any documentation about it.
Upvotes: 0
Views: 735
Reputation: 110136
You are performing a linear search for each item in your list - and that is definitely sub-optimal. For a store with 1 million items in stock, hat can lead to the order of (1000000)² comparisons, which can be quite a burden even to fast computers. That is just to start
The thing to do is to create a set with the ID's of one of the collections, and use set's "contains" (the same in
operator) - which searches in constant time.
def _find_items_not_present_in_store(self, store_today, store_yesterday):
yesterday_ids = set(item.item_id for item in store_yesterday)
return [item for item in store_today if item.item_id not in yesterday_ids]
And - in your code - in addition to searching in lists and not in a set, you are actually recreating the whole yesterday's ID list for each item in the today's list - as your list generator expression is inside the lambda function. In the approach above, I pre-calculate the ID set just once -aas that is what makes sense.
Besides that, as you can see, list comprehension and generator expressions in Python have an if
clause that supersedes the usage of the filter
function - filter
only makes sense when one choose to use a functional notation instead of generators/comprehensions - and in most cases will have the overhead of one additional function call.
Upvotes: 1
Reputation: 24052
The way you've written it, the list is part of the lambda expression, so it will be evaluated each time the lambda is invoked.
Here's the most efficient way to implement your function:
def _find_items_not_present_in_store(self, store_today, store_yesterday):
s = set(item2.item_id for item2 in store_today.store_items)
items_not_in_store_anymore = [item1 for item1 in store_yesterday
if item1.item_id not in s]
return items_not_in_store_anymore
This does 2 main things to improve the efficiency:
Upvotes: 1
Reputation: 101909
Every call of the lambda
function will re-create that list, so moving that construction outside the lambda
will improve performance.
Moreover using a list
to check for in
isn't a good idea, since it takes linear time. Consider using a set
instead:
def _find_items_not_present_in_store(self, store_today, store_yesterday):
today_ids = {item2.item_if for item2 in store_today.store_items}
items_not_in_store_anymore = filter(
lambda item1: item1.item_id not in today_ids,
store_yesterday
)
return items_not_in_store_anymore
In old versions of python you need to do set( ... )
instead of the set
-comprehension { ... }
.
Upvotes: 1