kmoney12
kmoney12

Reputation: 4480

Clustered Index Or Partition Table?

I have a file-sharing site where my users are interested in clicks on their files. Each click is stored as a new row in the clicks table.

Usually, they want to know how many clicks they got on a certain date range:

$statement = $db->prepare("SELECT COUNT(DISTINCT ip) FROM clicks WHERE user_id=? AND time BETWEEN ? AND ?");
$statement->execute(array($user_id, $from_date, $to_date));

Additionally, they can also see the number of clicks for a certain file:

$statement = $db->prepare("SELECT COUNT(DISTINCT ip) FROM clicks WHERE file_id=? AND time BETWEEN ? AND ?");
$statement->execute(array($file_id, $from_date, $to_date));

The problem with these queries, is that user_id and file_id are not keys for this table (they are not unique). Instead, a simple 'id' column is the primary key but it never plays into any of the queries.

I have been researching clustered indexes but I cannot figure out how to implement it in this case.

As the clicks table is growing pretty large (5-6 million rows) these queries are taking longer (and I plan for this table to get a lot bigger). I read that partitioning might be what I need to do?

Do I need to make a clustered key, partition the table, or both?

For reference, the clicks structure:

id time user_id ip file_id

Upvotes: 2

Views: 499

Answers (1)

Bill Karwin
Bill Karwin

Reputation: 562230

You don't need to change the clustered index.

I would suggest creating these indexes:

ALTER TABLE clicks ADD INDEX (file_id, time, ip),
                   ADD INDEX (user_id, time, ip);

By including the ip in the index definition, each queries should be able to get all the information needed from the index structure itself. This is called a covering index. Then the query won't need to touch the table at all, so it doesn't matter which columns comprise the clustered index for the table.

If you use EXPLAIN to analyze the query, you should see "Using index" in the Extra field, and this indicates the query is getting the benefit of a covering index.

I don't think partitioning will help in this case, because MySQL partitioning requires that the partition columns must be part of any primary key / unique key of the table.

Upvotes: 3

Related Questions