Hassan Baig
Hassan Baig

Reputation: 15824

Speeding up nested Django ORM queries

I'm using the handy Django sessions library in my Django project. This allows me to process Session objects via ORM queries.

The attributes I can access for each Session object are:

    Column     |           Type           | Modifiers 
---------------+--------------------------+-----------
 session_key   | character varying(40)    | not null
 session_data  | text                     | not null
 expire_date   | timestamp with time zone | not null
 user_id       | integer                  | 
 user_agent    | character varying(200)   | not null
 last_activity | timestamp with time zone | not null
 ip            | inet                     | not null

Where user_id is from the Django User model.

Using the Session library, I need to find the number of users in my app who currently don't have an entry in the Session table (and their corresponding IDs).

I tried it via the following:

logged_in_users = set(Session.objects.values_list('user_id',flat=True))
logged_in_users = [user_pk for user_pk in logged_in_users if user_pk is not None]
logged_out_users = set(User.objects.exclude(id__in=logged_in_users).values_list('id',flat=True))
num_logged_out = len(logged_out_users) #passed to template to display

My Session table contains 1.7M rows, whereas the User table contains 408K rows. The code above take an abnormally large amount of processing time (i.e. several minutes), and ultimately gives me a 500 error in production (works correctly on limited data sets in development).

Before trouble shooting what went wrong, I feel I also ought to optimize the query to get the result less expensively.

What are some obvious optimizations you feel my code needs? I know I can find the number of logged_out_users by subtracting the number of logged_in_users from total_users. But what about getting all their IDs?

Upvotes: 0

Views: 99

Answers (1)

nik_m
nik_m

Reputation: 12086

[UPDATE]: After some discussion in the comments, the question asked about retrieving (and counting) the ids of the logged out Users (i.e users that don't have an entry in the Session table at all).

So:

# Get a (flat) list of user ids where the user entry in not null
logged_in_users = Session.objects.filter(user__isnull=False).values_list('user__id', flat=True).distinct()

# Get a (flat) list of user ids excluding the previous logged ones (thus, this list will contain the logged out users)
logged_out_users = User.objects.exclude(id__in=logged_in_users).values_list('id', flat=True).distinct()

# Count the logged out users
logged_out_users_count = logged_out_users.count()

How about this:

# Fetch all user_id values from the Session table where the user ForeignKey has
# no value (thus, it's null)
null_user_ids = Session.objects.filter(user__isnull=True).values_list('user__id', flat=True)

# Count the null users
no_of_users = null_user_ids.count()

Upvotes: 1

Related Questions