Reputation: 3583
SELECT
b.categoryid,
SUM(viewcount) AS cnt,
categoryname
FROM
bookvisit AS bv
INNER JOIN book AS b ON b.isbn = bv.isbn
LEFT JOIN category AS c ON b.categoryid = c.categoryid
WHERE
b.categoryid IS NOT NULL AND
b.categoryid <> 0
GROUP BY
b.categoryid
ORDER BY
cnt DESC,
bv.isbn
LIMIT 0, 4
I have three tables - book (contains books information) bookvisit (book visit info) category (category master )
What I need is popular categories, above query is fine with two eq_ref but it has a Using temporary; Using filesort also
any help ?
Upvotes: 1
Views: 170
Reputation: 425331
As @dj_segfault said, you cannot have an index on an aggregate column in MySQL
and you will have to write a service that would cache the SUMs in a shapshot table (which you can index).
Here's how you can do it and still have the accurate statistics:
Create a snapshot table:
category cnt
with a PRIMARY KEY
on category
.
Create a single-field, single-record table called snapshot_time
:
taken
On a timely basis, fill this table with the query:
UPDATE snapshot_time
SET taken = NOW()
INSERT
INTO snapshot
SELECT b.category, COUNT(*) AS new_cnt,
(
SELECT taken
FROM snapshot_time
) AS new_taken
FROM bookvisit bv
JOIN book b
ON b.isbn = bv.isbn
WHERE bv.visit_time <=
(
SELECT taken
FROM snapshot_time
)
ON DUPLICATE KEY UPDATE
SET cnt = new_cnt,
snapshot_taken = new_taken
Create the following indexes:
snapshot (cnt)
bookvisit (visit_time)
book (category)
Run this query:
SELECT category,
cnt +
(
SELECT COUNT(*)
FROM bookvisit bv
JOIN book b
ON b.isbn = bv.isbn
WHERE bv.visit_time >
(
SELECT taken
FROM shapshot_time
)
AND b.category = s.category
) AS total
FROM snapshot
WHERE cnt >=
(
SELECT cnt
FROM snapshot
ORDER BY
cnt DESC
LIMIT 4
)
-
(
SELECT COUNT(*)
FROM bookvisit
WHERE bv.visit_time >
(
SELECT taken
FROM shapshot_time
)
)
ORDER BY
total DESC
LIMIT 4
The query will return you accurate visit count.
The main idea is that you need to scan only the records in bookvisit
that were collected after the statistics were cached.
More than that: you don't even have to scan all records in the cached statistics. Since the number of visits only grows, you can only scan the results that can possibly get into the first four.
If the 4th
record has 1,000,000
page views in the snapshot, and 1,000
page views happened after you took the snapshot, you can only select the records from the snapshot with cnt >= 999,000
. The other records could not theoretically hit this limit, since it would take more than 1K
page views.
The only problem is that you can delete the books or change their categories. In this case you would just need to recalculate the statistics or fall back to your original method.
Upvotes: 0
Reputation: 12409
Based on your comment from yesterday, I think the problem is you are doing an ORDER BY on a calculated (aggregate) column, so it can't use an index.
The only way around that would be to add a category.viewcount column that you update when you update book.viewcount. It wouldn't take up much extra space. The downsides of this solution are:
If this were me and performance was essential, I would write a separate service that would read in the current stats into memory on startup, and have the system send updates to it and query it to get the latest stats. That saves locking and writing the category table.
Of course if the numbers don't have to be 100% up to date you can always run the query overnight and cache the results for the day.
Upvotes: 1