Reputation: 517
I have a user table which has columns user_id, update_time, etc and an action table, which has columns user_id, action and create_time table.
class User(models.Model):
user_id = models.CharField(db_index = True, max_length = 255, unique = True, null = False)
update_time = models.DateTimeField(db_index = True, default = timezone.now, null = True, blank = True)
class Action(models.Model):
user_id = models.CharField(db_index = True, max_length = 255, null = False)
action = models.CharField(db_index = True, max_length = 15, unique = False, null = False)
create_time = models.DateTimeField(db_index = True, auto_now_add = True, null = True)
I want to save user's last active time in update_time column. I am getting many actions by many users per day. So I don't update user table update_time column while inserting an action. I update the update_time column by a background job, which finds the max of create_time of all actions corresponding to a user_id and update his update_time column in the user table.
The background job runs below code/query for this purpose. But the performance of this piece of code not so good. Can anyone help me optimize it, either a better MySQL query or even in a format of Django ORM query or any different strategy to do this overall thing?
days_limit = datetime.now() - timedelta(1)
query = "UPDATE user a JOIN (SELECT user_id, MAX(create_time) AS last_create_time FROM user_action WHERE create_time >= %s GROUP BY user_id) b ON a.user_id = b.user_id SET a.update_time = last_create_time WHERE a.update_time < last_create_time"
cursor = connection.cursor()
print cursor.execute(query, [str(days_limit)])
Upvotes: 1
Views: 947
Reputation: 53774
class User(models.Model):
user_id = models.CharField(db_index = True, max_length = 255,
If this is a numeric user_id
and not a user name, this column doesn't make any sense. It should be removed. One of the critical ways of speeding up mysql queries is to reduce the size of the data and the indices on disk. This column seems to be redundant of if it needs to be preserved it should be int.
class Action(models.Model):
user_id = models.CharField(db_index = True, max_length = 255, null = False)
As above but what you should really be having here is
class Action(models.Model):
user_id = models.ForeignKey(User)
because without a foreign key, you don't have a relationship between the two models and that's why you are forced to use raw queries rather than an ORM query
I am getting many actions by many users per day. So I don't update user table update_time column while inserting an action
If you had the correct table structure, doing many many updates each day wouldn't be slow at all. Updating a single row that's referred to by it's primary key will in fact be very fast. Much faster than the complex join query you are using now.
If you are really concerned about the few milliseconds that such an update would take, you can use celery to run it in the background.
UPDATE user a JOIN
(SELECT user_id, MAX(create_time) AS last_create_time FROM user_action WHERE create_time >= %s GROUP BY user_id) b
ON a.user_id = b.user_id SET a.update_time = last_create_time WHERE a.update_time < last_create_time"
To speed this up you need a multicolumn index on user_id and create_time but the user_id column is redundant ....
Upvotes: 1