Adam
Adam

Reputation: 20922

Partitioning a Users Table

I have a table that holds all system users. As time passes some user accounts end up dormant and unused in this table.

If I had a large number of users it could be good to partition the table by date (last logon date) and then query for users based on date. eg: you login and your lastlogondate is updated which is what the partitioning uses.

Question: would this mean all the current users would be say in the first partition as their date stamp is current. Could this be a good way to keep the current pool of active users in one partition and the reset of obsolute accounts would end up residing in other partitions depending on there last active date. (Note: if the user logged in again they would become active again).

Would this be a good way to speed up queries?

Upvotes: 1

Views: 1313

Answers (1)

Bill Karwin
Bill Karwin

Reputation: 562731

No, I would not use this method of partitioning.

To benefit from MySQL partition pruning, your queries must reference the column of the partition key. So for your proposed partitioning scheme to give any benefit, all your queries for users would need to reference the last_login column, and you would need to update the value you compare against the last_login column continually:

SELECT ... FROM users WHERE user_name = 'Adam' AND last_login >= '2013-01-01'
-- remember to change this to '2013-02-01' by next month

Without the term searching for specific last_login, the query would have to scan all the partitions.

I'd also be worried about "partition churn" that is, rows moving from one partition to another frequently.

Also recall that in MySQL, the partition column must be part of every primary or unique key in the table. So using last_login as the partition key would require you to define your table:

CREATE TABLE Users (
  user_name VARCHAR(12) NOT NULL,
  last_login DATETIME NOT NULL,
  ...
  PRIMARY KEY (user_name, last_login)
);

This opens up the possible data anomaly of another user creating account "Adam", as long as they have a distinct last login time. Both accounts could exist in the same table for an indefinite length of time, until the two Adams happen to log in at the exact same time. Then one may be denied login because of a primary key violation. That would be a very puzzling reason to be denied a login.

A slightly better partitioning scheme would be this:

CREATE TABLE Users (
  user_name VARCHAR(12) NOT NULL,
  last_login DATETIME NOT NULL,
  is_archived TINYINT(1) NOT NULL DEFAULT 0,
  ...
  PRIMARY KEY (user_name, is_archived)
) PARTITION BY HASH(is_archived) PARTITIONS 2;

The intent is that you periodically run a job to manually archive users:

UPDATE Users SET is_archived=1 WHERE last_login < CURDATE() - INTERVAL 30 DAY;

This solves the partition churn problem and the partition-creation chore problem. It still potentially allows more than one "Adam" to exist, but if you carefully control the instances of moving a row from one partition to the other, that should be lower risk.

You'd still have to reference the partition key in your queries, but the value you compare against would be fixed:

SELECT ... FROM users WHERE user_name = 'Adam' AND is_archived = 0;

Upvotes: 3

Related Questions