EazyC
EazyC

Reputation: 819

Django: Get 10 random instances from a queryset and order them into a new queryset?

I want to create a dynamic homepage for my app that features 10 different pages/profiles of people using the site each time the home page is accessed. I know that the django SQL query for randomized query is extremely slow so I am trying to write my own method of doing this (pseudo)random sample by creating an empty list, and creating a list of random numbers and then grabbing the random nth element of the queryset and placing that into the list.

import random
profilelist = [] #create an empty list
qindex = ProfilePage.objects.filter(profileisbannedis=False) #queryset for all of possible profiles to be displayed
randlist = random.sample(xrange(qindex.count()), 10) #create a list of 10 numbers between range 0 and the size of the queryset. 
#this method also does not repeat the same randomly generated number which is ideal since I don't want to feature the same profile twice
for i in randlist: 
    tile = qindex[i] #for each random number created, get that element of the queryset
    profilelist.extend(tile) #place each object in the previous queryset into a new list of objects and continue extending the list for each of 10 random numbers

I can't really figure out how to do it this way since I know I get the error "object is not iterable" on the last line of the code so creating a new queryset piece by piece like this is not the proper way. How can I go about doing this/creating my random queryset made from a previous filtered queryset?

Upvotes: 8

Views: 15944

Answers (5)

Saurabh Goyal
Saurabh Goyal

Reputation: 647

One thing you can do is take a list of ids (assuming, 'id' is the primary key) of random elements in the queryset, and then filter on those. Something like the code below:

import random
valid_profiles_id_list = ProfilePage.objects.filter(profileisbannedis=False).values_list('id', flat=True)
random_profiles_id_list = random.sample(valid_profiles_id_list, min(len(valid_profiles_id_list), 10))
query_set = ProfilePage.objects.filter(id__in=random_profiles_id_list)

Hope it helps, also please go through django queryset docs

Upvotes: 18

karuoro
karuoro

Reputation: 571

I prefer to shuffle the entire queryset everytime page loads and slice the first ten objects in the template. In the views:

ids = [i.id for i in Model.objects.filter(some_field=some_variable)]
random.shuffle(ids)
shuffled = [Model.objects.get(id=i) for i in ids]

In the template:

{% for i in shuffled|slice:'10' %}

Upvotes: 0

Don Kirkby
Don Kirkby

Reputation: 56700

Before you get more complicated than necessary, I suggest you test whether order_by('?') is actually slow on your database.

In your question, you say:

I know that the django SQL query for randomized query is extremely slow so I am trying to write my own method of doing this ...

Here's what the Django documentation says:

Note: order_by('?') queries may be expensive and slow, depending on the database backend you’re using.

So you should check whether your database is one with a performance problem here.

The other answers suggest two options:

  1. Choose random numbers between 0 and count-1, then filter based on those ids.
  2. Select a list of all the ids in the database, then randomly choose some from the list, and filter based on those ids.

Option 1 has problems with the boundary conditions and with gaps in the id numbers. Option 2 makes two database queries, one of which returns all id numbers in the database. Both options may return the final selection sorted by id number.

With all of those problems and the increased complexity, you should at least measure the benefit to decide if it's worthwhile.

Here's what happened when I measured the performance of option 1 against order_by('?') in a SQLite3 database selecting 10 out of 1000 small records:

Select in database with random order:
205, 49, 28, 542, 428, 1, 337, 860, 374, 303
[8.38821005821228, 7.809916019439697, 7.193678855895996, 8.39355993270874, 8.132720947265625]
Filter by random id numbers:
135, 357, 406, 476, 552, 580, 662, 663, 670, 889
[8.62951397895813, 8.145615100860596, 8.251683950424194, 7.629027843475342, 7.384187936782837]

Here's the result when I tried the same thing in PostgreSQL:

Select in database with random order:
117, 337, 160, 500, 468, 178, 845, 542, 735, 525
[13.016371965408325, 12.65379810333252, 12.106752872467041, 12.485779047012329, 12.837188959121704]
Filter by random id numbers:
59, 65, 108, 161, 213, 246, 301, 813, 854, 969
[18.311591863632202, 20.5823872089386, 13.955725193023682, 13.034253120422363, 13.079485177993774]

If there is a significant difference, the order_by('?') looks better. How does it look in your database? If you decide to go with option 1, change the boundaries to match your id numbers. If you decide to go with option 2, the other answers look fine.

Here's the code I used to test the SQLite3 version. You can save it to a file and run it as is:

# Tested with Django 1.9.2
import sys
import timeit
from random import sample

import django
from django.apps import apps
from django.apps.config import AppConfig
from django.conf import settings
from django.db import connections, models, DEFAULT_DB_ALIAS
from django.db.models.base import ModelBase

NAME = 'udjango'
SELECT_COUNT = 10
base_query = None


def select_in_database():
    chosen = base_query.order_by('?')[:SELECT_COUNT]
    return list(chosen)


def select_by_random_id():
    db_size = base_query.count()
    random_ids = sample(xrange(db_size), SELECT_COUNT)
    chosen = base_query.filter(id__in=random_ids)
    return list(chosen)


def main():
    global base_query
    setup()

    class Person(models.Model):
        first_name = models.CharField(max_length=30)
        last_name = models.CharField(max_length=30)

    syncdb(Person)

    for i in range(1000):
        Person.objects.create(first_name=str(i), last_name=str(i))

    base_query = Person.objects.all()

    print('Select in database with random order:')
    print(', '.join(person.first_name for person in select_in_database()))
    print(timeit.repeat('select_in_database()',
                        'from __main__ import select_in_database',
                        repeat=5,
                        number=10000))
    print('Filter by random id numbers:')
    print(', '.join(person.first_name for person in select_by_random_id()))
    print(timeit.repeat('select_by_random_id()',
                        'from __main__ import select_by_random_id',
                        repeat=5,
                        number=10000))


def setup():
    DB_FILE = NAME + '.db'
    with open(DB_FILE, 'w'):
        pass  # wipe the database
    settings.configure(
        DEBUG=True,
        DATABASES={
            DEFAULT_DB_ALIAS: {
                'ENGINE': 'django.db.backends.sqlite3',
                'NAME': DB_FILE}},
        LOGGING={'version': 1,
                 'disable_existing_loggers': False,
                 'formatters': {
                    'debug': {
                        'format': '%(asctime)s[%(levelname)s]'
                                  '%(name)s.%(funcName)s(): %(message)s',
                        'datefmt': '%Y-%m-%d %H:%M:%S'}},
                 'handlers': {
                    'console': {
                        'level': 'DEBUG',
                        'class': 'logging.StreamHandler',
                        'formatter': 'debug'}},
                 'root': {
                    'handlers': ['console'],
                    'level': 'WARN'},
                 'loggers': {
                    "django.db": {"level": "WARN"}}})
    app_config = AppConfig(NAME, sys.modules['__main__'])
    apps.populate([app_config])
    django.setup()
    original_new_func = ModelBase.__new__

    @staticmethod
    def patched_new(cls, name, bases, attrs):
        if 'Meta' not in attrs:
            class Meta:
                app_label = NAME
            attrs['Meta'] = Meta
        return original_new_func(cls, name, bases, attrs)
    ModelBase.__new__ = patched_new


def syncdb(model):
    """ Standard syncdb expects models to be in reliable locations.

    Based on https://github.com/django/django/blob/1.9.3
    /django/core/management/commands/migrate.py#L285
    """
    connection = connections[DEFAULT_DB_ALIAS]
    with connection.schema_editor() as editor:
        editor.create_model(model)


main()

Upvotes: 5

gzerone
gzerone

Reputation: 2227

First, I think you should replace 'extend' with 'append' for your profilelist, qindex[i] is not iterable.

Second, I feel the simplest way is:

q_ids = qindex.values_list('id', flat=True)
r_ids = random.sample(q_ids, 10)
return qindex.filter(id__in=r_ids)

Have a try, :)

Upvotes: 5

Austin A
Austin A

Reputation: 3138

After a quick test, I see that using xrange with random.sample does provide a list back so the xrange isn't your problem.

>>> import random
>>> a = xrange(100)
>>> rnd = random.sample(a, 10)
>>> rnd
[41, 83, 89, 73, 37, 58, 38, 99, 10, 84]

I've done this before with django. Below is a code snippet from the app. The only thing I'm doing different is using count() on all objects rather than a filter. My next suggestion is to make sure the count on the django filter is what you expect.

# Choose 10 random records to show
num_entities = Entity.objects.all().count()
rand_entities = random.sample(range(num_entities), 10)
sample_entities = Entity.objects.filter(eid__in=rand_entities)

Upvotes: 6

Related Questions