Reputation: 2149
I have situation where I need to loop over two lists of objects and find equal and then loop over their fields and change some attributes. Looks it like this
for new_product in products_and_articles['products']:
for old_product in products_for_update:
if new_product.article == old_product.article:
for old_field in old_product._meta.get_all_field_names():
for new_field in new_product._meta.get_all_field_names():
if old_field == new_field and old_field != 'id' and old_field != 'slug':
setattr(old_product, old_field, getattr(new_product, old_field))
Obviously this is far from being good or even acceptable. So I'm seeking for advice how can I avoid so much looping and enhance the algorythm
Upvotes: 0
Views: 207
Reputation: 8966
It helps if you break the process into logical, reusable parts.
for new_product in products_and_articles['products']:
for old_product in products_for_update:
if new_product.article == old_product.article:
…
For example, here what you are doing is find a product that matches a particular article
. Since article
is unique, we could write something like this:
def find_products_by_article(products, article):
'''Find all products that match the given article. Returns
either a product or 'None' if it doesn't exist.'''
for products in products:
return product
Then call it with:
for old_product in products_for_update:
new_products = find_products_by_article(
products_and_articles['products'],
old_product.article)
…
But this could be much more efficient if we could take advantage of a data structure that is optimized for look-ups, namely a dict
(constant instead of linear complexity). So what we could do instead is:
# build a dictionary that stores products indexed by article
products_by_article = dict(product.article, product for product in
products_and_articles['products'])
for old_product in products_for_update:
try:
# look up product in the dictionary
new_product = products_by_article[old_product.article]
except KeyError:
# silently ignore products that don't exist
continue
…
If you do such lookups frequently, it would be better to reuse the products_by_article
dictionary elsewhere as well instead of building one from scratch every time. Be careful though: if you use multiple representations of the product records, you need make they always stay in sync!
For the inner loops, notice that new_field
here only serves as a check for whether a field exists:
…
for old_field in old_product._meta.get_all_field_names():
for new_field in new_product._meta.get_all_field_names():
if old_field == new_field and old_field != 'id' and old_field != 'slug':
setattr(old_product, old_field, getattr(new_product, old_field))
(Note that this is slightly suspicious: any new fields that don't already exist in the old_product
are silently discarded: is this intentional?)
This can be repackaged as follows:
def transfer_fields(old, new, exclusions=('id', 'slug')):
'''Update all pre-existing fields in the old record to have
the same values as the new record. The 'exclusions' parameter
can be used to exclude certain fields from being updated.'''
# use a set here for efficiency reasons
fields = frozenset(old._meta.get_all_field_names())
fields.difference_update(new._meta.get_all_field_names())
fields.difference_update(exclusions)
for field in fields:
setattr(old, field, getattr(new, field))
Putting all this together:
# dictionary of products indexed by article
products_by_article = dict(product.article, product for product in
products_and_articles['products'])
for old_product in products_for_update:
try:
new_product = products_by_article[old_product.article]
except KeyError:
continue # ignore non-existent products
transfer_fields(old_product, new_product)
This final code has a time complexity of O(n × k)
, where n
is the number of products and k
is the number of fields.
Upvotes: 5
Reputation: 6014
We start with four loops and an efficiency of O(n^2*k^2)
, n being the number of items and k being the number of attributes. Let's see what we can do.
First of all, get rid of the new_product
loop, you don't need it:
for old_field in old_product._meta.get_all_field_names():
for new_field in new_product._meta.get_all_field_names():
if old_field == new_field and old_field != 'id' and old_field != 'slug':
setattr(old_product, old_field, getattr(new_product, old_field))
To:
for old_field in old_product._meta.get_all_field_names():
if old_field != 'id' and old_field != 'slug':
setattr(old_product, old_field, getattr(new_product, old_field))
Got it to O(n^2*k). Now for the product finding part.
First, get the two lists sorted, then proceed like you do when you merge lists in merge sort:
a = sorted(products_and_articles['products'], key=lambda x: x.article)
b = sorted(products_for_update, key=lambda x: x.article)
i = j = 0
while(i < len(a) and j < len(b)):
if (a[i].article < b[j].article):
a += 1
continue
if (a[i].article > b[j].article):
b += 1
continue
...logic...
a += 1 # Maybe you want to get rid of this one, I'm not sure..
b += 1
Depending on the size of your database, it might be more or less adequate, because it requires you to make new sorted lists. Not very heavy in memory (it's only refs anyway), but if you have really long lists and limited space the huge efficiency win may not compensate.
Got it down to O(n*logn*k)
, it's the best I could do. You can, probably, get it even lower using dictionaries, but it requires you to change your db, so it requires more time and effort.
Upvotes: 1
Reputation: 1476
the first two for's can be changed to:
from itertools import product
for new_product, old_product in product(list1, list2)
# logic and other loops
and you can do the same for the two inner loops:
for old_field in old_product._meta.get_all_field_names(): for new_field in new_product._meta.get_all_field_names():
for old_field, new_field in product(list1, list2)
Upvotes: 0
Reputation: 107347
You can use set
to find the intersection instead loop over both lists and check for equality :
set(products_and_articles['products']).intersection(set(products_for_update))
example :
>>> l=[1,2,3]
>>> a=[2,3,4]
>>> set(l).intersection(set(a))
set([2, 3])
Upvotes: 1