micgeronimo
micgeronimo

Reputation: 2149

More effective loop in Python

I have situation where I need to loop over two lists of objects and find equal and then loop over their fields and change some attributes. Looks it like this

for new_product in products_and_articles['products']:
  for old_product in products_for_update:
    if new_product.article == old_product.article:
      for old_field in old_product._meta.get_all_field_names():
        for new_field in new_product._meta.get_all_field_names():
          if old_field == new_field and old_field != 'id' and old_field != 'slug':
            setattr(old_product, old_field, getattr(new_product, old_field))

Obviously this is far from being good or even acceptable. So I'm seeking for advice how can I avoid so much looping and enhance the algorythm

Upvotes: 0

Views: 207

Answers (4)

Rufflewind
Rufflewind

Reputation: 8966

It helps if you break the process into logical, reusable parts.

for new_product in products_and_articles['products']:
  for old_product in products_for_update:
    if new_product.article == old_product.article:
      …

For example, here what you are doing is find a product that matches a particular article. Since article is unique, we could write something like this:

def find_products_by_article(products, article):
  '''Find all products that match the given article.  Returns
  either a product or 'None' if it doesn't exist.'''
  for products in products:
    return product

Then call it with:

for old_product in products_for_update:
  new_products = find_products_by_article(
                   products_and_articles['products'],
                   old_product.article)
  …

But this could be much more efficient if we could take advantage of a data structure that is optimized for look-ups, namely a dict (constant instead of linear complexity). So what we could do instead is:

# build a dictionary that stores products indexed by article
products_by_article = dict(product.article, product for product in
                           products_and_articles['products'])

for old_product in products_for_update:
  try:
    # look up product in the dictionary
    new_product = products_by_article[old_product.article]
  except KeyError:
    # silently ignore products that don't exist
    continue
  …

If you do such lookups frequently, it would be better to reuse the products_by_article dictionary elsewhere as well instead of building one from scratch every time. Be careful though: if you use multiple representations of the product records, you need make they always stay in sync!

For the inner loops, notice that new_field here only serves as a check for whether a field exists:

…
  for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
      if old_field == new_field and old_field != 'id' and old_field != 'slug':
        setattr(old_product, old_field, getattr(new_product, old_field))

(Note that this is slightly suspicious: any new fields that don't already exist in the old_product are silently discarded: is this intentional?)

This can be repackaged as follows:

def transfer_fields(old, new, exclusions=('id', 'slug')):
  '''Update all pre-existing fields in the old record to have
  the same values as the new record.  The 'exclusions' parameter
  can be used to exclude certain fields from being updated.'''
  # use a set here for efficiency reasons
  fields = frozenset(old._meta.get_all_field_names())
  fields.difference_update(new._meta.get_all_field_names())
  fields.difference_update(exclusions)
  for field in fields:
    setattr(old, field, getattr(new, field))

Putting all this together:

# dictionary of products indexed by article
products_by_article = dict(product.article, product for product in
                           products_and_articles['products'])

for old_product in products_for_update:
  try:
    new_product = products_by_article[old_product.article]
  except KeyError:
    continue          # ignore non-existent products
  transfer_fields(old_product, new_product)

This final code has a time complexity of O(n × k), where n is the number of products and k is the number of fields.

Upvotes: 5

unddoch
unddoch

Reputation: 6014

We start with four loops and an efficiency of O(n^2*k^2), n being the number of items and k being the number of attributes. Let's see what we can do.

First of all, get rid of the new_product loop, you don't need it:

for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
        if old_field == new_field and old_field != 'id' and old_field != 'slug':
            setattr(old_product, old_field, getattr(new_product, old_field))

To:

for old_field in old_product._meta.get_all_field_names():
    if old_field != 'id' and old_field != 'slug':
        setattr(old_product, old_field, getattr(new_product, old_field))

Got it to O(n^2*k). Now for the product finding part.

First, get the two lists sorted, then proceed like you do when you merge lists in merge sort:

a = sorted(products_and_articles['products'], key=lambda x: x.article)
b = sorted(products_for_update, key=lambda x: x.article)
i = j = 0
while(i < len(a) and j < len(b)):
    if (a[i].article < b[j].article):
        a += 1
        continue
    if (a[i].article > b[j].article):
        b += 1
        continue
    ...logic...
    a += 1  # Maybe you want to get rid of this one, I'm not sure..
    b += 1

Depending on the size of your database, it might be more or less adequate, because it requires you to make new sorted lists. Not very heavy in memory (it's only refs anyway), but if you have really long lists and limited space the huge efficiency win may not compensate.

Got it down to O(n*logn*k), it's the best I could do. You can, probably, get it even lower using dictionaries, but it requires you to change your db, so it requires more time and effort.

Upvotes: 1

Urban48
Urban48

Reputation: 1476

the first two for's can be changed to:

from itertools import product


for new_product, old_product in product(list1, list2)
    # logic and other loops

and you can do the same for the two inner loops:

 for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
for old_field, new_field in product(list1, list2)

Upvotes: 0

Kasravnd
Kasravnd

Reputation: 107347

You can use set to find the intersection instead loop over both lists and check for equality :

set(products_and_articles['products']).intersection(set(products_for_update))

example :

>>> l=[1,2,3]
>>> a=[2,3,4]
>>> set(l).intersection(set(a))
set([2, 3])

Upvotes: 1

Related Questions