Hassan Baig
Hassan Baig

Reputation: 15824

A big and ever-growing table in my Django app needs a better performance strategy

For first-time Internet users from developing economies with very basic, non-js supporting mobile phone devices and flaky Internet coverage, I host a simple Django-based message board.

The "home" page of this message board is the main gig. It's where everyone can just directly write anything. Moreover, users constantly refresh this page to see if anything new has appeared from other users (it's read heavy); treating the place as a public chat room.

This home page is fed by a table that is big (>1M rows) and ever-growing (>10K added every day; accelerating). It's represented by a class-based ListView in views.py. I happen to need only the latest ~200 rows from this big table (ordered by '-id'), since my users don't dig into the past. Thus in the CBV, queryset = Post.objects.order_by('-id').exclude(submitter_id__in=condemned)[:200] (condemned are users who are hell-banned, submitter is a User foreignkey). Moreover, there is a whole bunch of additional stuff happening in get_context_data().

According to new-relic, this CBV is by far the most time-consuming view I have in my app. Running SELECT on the big table takes by far the most time. I can't cache the home page because users are treating this message board as quasi-chat.

Could I have framed that queryset slice differently, or that's the best I can do? From your experience, does anything jump out that you think I ought to do to improve the performance of this CBV? I'm open to sharing more information, so ask away if needed. My db is postgresql.

Upvotes: 1

Views: 191

Answers (2)

Dmitry Yudin
Dmitry Yudin

Reputation: 978

You need to create index on submitter_id field of PostgreSQL. Also if you want to users to see messages instantly you can use Meteor DDP for django, in short words it will expose objects collections via Distributed Data Protocol and then in a front-end MeteorJS will catch changes and show new messages without need to reload (it uses sockets). Just try my app I wrote at ioaware.meteor.com, I created it in 2 days, just register and add a lesson, and look for this process in separate browser

Upvotes: 0

Sayse
Sayse

Reputation: 43300

Moved my comments into an answer

I would imagine that the thing that is taking the longest time is the ordering.

From the docs for order_by:

Ordering is not a free operation. Each field you add to the ordering incurs a cost to your database.

Judging by your query, you don't really need to order_by at all and can just take from the opposite end like you have done in your comment

Post.objects.exclude(submitter_id__in=condemned)[200:]

This would work as by default, a models ordering which order_by uses from the models Meta class is id.

The only other thing that may be making a difference depends on what condemned is, I'm hoping its a queryset too but if it isn't (i.e some read in text file) I would look into that also.

Since you have said that condemned is a list derived from values_list.

values_list might not help I don't think because I think it resolves the query, You might want to try with either .all or just .values('id'). Values returns a ValuesQuerySet which if im not mistaken doesn't resolve the query. I haven't looked into what values_list returns though

Edit: I've just looked and values_list returns a ValuesListQuerySet, and obviously a dictionary lookup (which values uses) would be faster than a list lookup

Upvotes: 1

Related Questions