Tinker
Tinker

Reputation: 4525

How can I make this queryset more tolerable?

I have the following class:

class Instance(models.Model):
    products = models.ManyToManyField(Product, blank=True)


class Product(models.Model):
    description = HTMLField(blank=True, null=True)
    short_description = HTMLField(blank=True, null=True)

And this form that I use to update Instances

class InstanceModelForm(InstanceValidatorMixin, UpdateInstanceLastUpdatedMixin, forms.ModelForm):
    class Meta:
        model = Instance
    products = forms.ModelMultipleChoiceField(required=False, queryset=Product.objects.annotate(i_count=Count('instance')).order_by('i_count'))

My instance-product table is sizable (~ 1000 rows) and ever since I've added the queryset for products I am seeing web requests that are timing out due heroku's 30 second request limit.

My goal is to do something to this queryset such that my users are no longer timing out.

I have the following insights:

  1. Accuracy doesn't matter as much to me - It doesn't have to be very accurate. Yes I would like to sort products by the count of instances this product has linked to but if it's off by 5 or 10 it doesn't really matter that much.

  2. Limited number of products - When my users are selecting products to be linked to an instance, they are primarily interested in products with less than 10 total linkages to instances. I don't know if a partial query will be accurate, but if this is possible I am open to trying.

  3. Effort - I know there are frameworks out there that I can install to cache many things. I am looking for something that is light weight and requires less than 1 hr to get up and running.

Upvotes: 1

Views: 52

Answers (1)

Daniel Hepper
Daniel Hepper

Reputation: 29977

First I would want to ensure that the performance issue actually comes from the query. I've tried to reproduce your problem:

>>> Instance.objects.count()
102499
>>> Product.objects.count()
1000
>>> sum(p.instance_set.count() for p in Product.objects.all())/Product.objects.count()
273.084
>>> list(Product.objects.annotate(i_count=Count('instance')).order_by('i_count'))
[...]
>>> from django.db import connection
>>> connection.queries[-1]
{'sql': 'SELECT "products_product"."id", "products_product"."description", "products_product"."short_description", COUNT("products_instance_products"."instance_id") AS "i_count" FROM "products_product" LEFT OUTER JOIN "products_instance_products" ON ("products_product"."id" = "products_instance_products"."product_id") GROUP BY "products_product"."id", "products_product"."description", "products_product"."short_description" ORDER BY "i_count" ASC', 'time': '0.189'}

By accident, I created a dataset that is probably quite a bit bigger than yours. As you can see, I have 1000 Products with an average of ~273 related Instances, but the query still takes less than a second (both on SQLite and PostgreSQL).

Use a one-off dyno with heroku run bash and check if you get the same numbers.

My guess is that your performance issues are either caused by

  • an n+1 query, where an extra query is made for each Product, e.g. in your Product.__str__ method.
  • the actual rendering of the MultipleChoiceField field. By default, it will render as a <select> with an <option> for each Product. This can be quite slow, and even it wasn't, it would pretty inconvenient to use. You might want to use a different widget, like django-select2.

Upvotes: 1

Related Questions