Reputation: 819
I want to create a dynamic homepage for my app that features 10 different pages/profiles of people using the site each time the home page is accessed. I know that the django SQL query for randomized query is extremely slow so I am trying to write my own method of doing this (pseudo)random sample by creating an empty list, and creating a list of random numbers and then grabbing the random nth element of the queryset and placing that into the list.
import random
profilelist = [] #create an empty list
qindex = ProfilePage.objects.filter(profileisbannedis=False) #queryset for all of possible profiles to be displayed
randlist = random.sample(xrange(qindex.count()), 10) #create a list of 10 numbers between range 0 and the size of the queryset.
#this method also does not repeat the same randomly generated number which is ideal since I don't want to feature the same profile twice
for i in randlist:
tile = qindex[i] #for each random number created, get that element of the queryset
profilelist.extend(tile) #place each object in the previous queryset into a new list of objects and continue extending the list for each of 10 random numbers
I can't really figure out how to do it this way since I know I get the error "object is not iterable" on the last line of the code so creating a new queryset piece by piece like this is not the proper way. How can I go about doing this/creating my random queryset made from a previous filtered queryset?
Upvotes: 8
Views: 15944
Reputation: 647
One thing you can do is take a list of ids (assuming, 'id' is the primary key) of random elements in the queryset, and then filter on those. Something like the code below:
import random
valid_profiles_id_list = ProfilePage.objects.filter(profileisbannedis=False).values_list('id', flat=True)
random_profiles_id_list = random.sample(valid_profiles_id_list, min(len(valid_profiles_id_list), 10))
query_set = ProfilePage.objects.filter(id__in=random_profiles_id_list)
Hope it helps, also please go through django queryset docs
Upvotes: 18
Reputation: 571
I prefer to shuffle the entire queryset everytime page loads and slice the first ten objects in the template. In the views:
ids = [i.id for i in Model.objects.filter(some_field=some_variable)]
random.shuffle(ids)
shuffled = [Model.objects.get(id=i) for i in ids]
In the template:
{% for i in shuffled|slice:'10' %}
Upvotes: 0
Reputation: 56700
Before you get more complicated than necessary, I suggest you test whether order_by('?')
is actually slow on your database.
In your question, you say:
I know that the django SQL query for randomized query is extremely slow so I am trying to write my own method of doing this ...
Here's what the Django documentation says:
Note:
order_by('?')
queries may be expensive and slow, depending on the database backend you’re using.
So you should check whether your database is one with a performance problem here.
The other answers suggest two options:
count-1
, then filter based on those ids.Option 1 has problems with the boundary conditions and with gaps in the id numbers. Option 2 makes two database queries, one of which returns all id numbers in the database. Both options may return the final selection sorted by id number.
With all of those problems and the increased complexity, you should at least measure the benefit to decide if it's worthwhile.
Here's what happened when I measured the performance of option 1 against order_by('?')
in a SQLite3 database selecting 10 out of 1000 small records:
Select in database with random order:
205, 49, 28, 542, 428, 1, 337, 860, 374, 303
[8.38821005821228, 7.809916019439697, 7.193678855895996, 8.39355993270874, 8.132720947265625]
Filter by random id numbers:
135, 357, 406, 476, 552, 580, 662, 663, 670, 889
[8.62951397895813, 8.145615100860596, 8.251683950424194, 7.629027843475342, 7.384187936782837]
Here's the result when I tried the same thing in PostgreSQL:
Select in database with random order:
117, 337, 160, 500, 468, 178, 845, 542, 735, 525
[13.016371965408325, 12.65379810333252, 12.106752872467041, 12.485779047012329, 12.837188959121704]
Filter by random id numbers:
59, 65, 108, 161, 213, 246, 301, 813, 854, 969
[18.311591863632202, 20.5823872089386, 13.955725193023682, 13.034253120422363, 13.079485177993774]
If there is a significant difference, the order_by('?')
looks better. How does it look in your database? If you decide to go with option 1, change the boundaries to match your id numbers. If you decide to go with option 2, the other answers look fine.
Here's the code I used to test the SQLite3 version. You can save it to a file and run it as is:
# Tested with Django 1.9.2
import sys
import timeit
from random import sample
import django
from django.apps import apps
from django.apps.config import AppConfig
from django.conf import settings
from django.db import connections, models, DEFAULT_DB_ALIAS
from django.db.models.base import ModelBase
NAME = 'udjango'
SELECT_COUNT = 10
base_query = None
def select_in_database():
chosen = base_query.order_by('?')[:SELECT_COUNT]
return list(chosen)
def select_by_random_id():
db_size = base_query.count()
random_ids = sample(xrange(db_size), SELECT_COUNT)
chosen = base_query.filter(id__in=random_ids)
return list(chosen)
def main():
global base_query
setup()
class Person(models.Model):
first_name = models.CharField(max_length=30)
last_name = models.CharField(max_length=30)
syncdb(Person)
for i in range(1000):
Person.objects.create(first_name=str(i), last_name=str(i))
base_query = Person.objects.all()
print('Select in database with random order:')
print(', '.join(person.first_name for person in select_in_database()))
print(timeit.repeat('select_in_database()',
'from __main__ import select_in_database',
repeat=5,
number=10000))
print('Filter by random id numbers:')
print(', '.join(person.first_name for person in select_by_random_id()))
print(timeit.repeat('select_by_random_id()',
'from __main__ import select_by_random_id',
repeat=5,
number=10000))
def setup():
DB_FILE = NAME + '.db'
with open(DB_FILE, 'w'):
pass # wipe the database
settings.configure(
DEBUG=True,
DATABASES={
DEFAULT_DB_ALIAS: {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': DB_FILE}},
LOGGING={'version': 1,
'disable_existing_loggers': False,
'formatters': {
'debug': {
'format': '%(asctime)s[%(levelname)s]'
'%(name)s.%(funcName)s(): %(message)s',
'datefmt': '%Y-%m-%d %H:%M:%S'}},
'handlers': {
'console': {
'level': 'DEBUG',
'class': 'logging.StreamHandler',
'formatter': 'debug'}},
'root': {
'handlers': ['console'],
'level': 'WARN'},
'loggers': {
"django.db": {"level": "WARN"}}})
app_config = AppConfig(NAME, sys.modules['__main__'])
apps.populate([app_config])
django.setup()
original_new_func = ModelBase.__new__
@staticmethod
def patched_new(cls, name, bases, attrs):
if 'Meta' not in attrs:
class Meta:
app_label = NAME
attrs['Meta'] = Meta
return original_new_func(cls, name, bases, attrs)
ModelBase.__new__ = patched_new
def syncdb(model):
""" Standard syncdb expects models to be in reliable locations.
Based on https://github.com/django/django/blob/1.9.3
/django/core/management/commands/migrate.py#L285
"""
connection = connections[DEFAULT_DB_ALIAS]
with connection.schema_editor() as editor:
editor.create_model(model)
main()
Upvotes: 5
Reputation: 2227
First, I think you should replace 'extend' with 'append' for your profilelist, qindex[i] is not iterable.
Second, I feel the simplest way is:
q_ids = qindex.values_list('id', flat=True)
r_ids = random.sample(q_ids, 10)
return qindex.filter(id__in=r_ids)
Have a try, :)
Upvotes: 5
Reputation: 3138
After a quick test, I see that using xrange
with random.sample
does provide a list back so the xrange isn't your problem.
>>> import random
>>> a = xrange(100)
>>> rnd = random.sample(a, 10)
>>> rnd
[41, 83, 89, 73, 37, 58, 38, 99, 10, 84]
I've done this before with django. Below is a code snippet from the app. The only thing I'm doing different is using count()
on all objects rather than a filter. My next suggestion is to make sure the count on the django filter is what you expect.
# Choose 10 random records to show
num_entities = Entity.objects.all().count()
rand_entities = random.sample(range(num_entities), 10)
sample_entities = Entity.objects.filter(eid__in=rand_entities)
Upvotes: 6