Reputation: 1083
I've two tables tbl_data
and tbl_user_data
Structure of tbl_data
id (int) (primary)
names (varchar)
dept_id (int)
Structure of tbl_user_data
:
id (int) (primary)
user_id (int)
names_id (int)
tbl_data.id
and tbl_user_data.names_id
are foreign key
I've situation where I've to pick 25 random entries from tbl_data
which is not served earlier to particular user. So I've created a tbl_user_data
which will store user_id
and names_id
(from tbl_data
which is already served).
I'm bit confused, how to query on behalf of this or is there any other way to do this efficiently ?
Note: tbl_data
have more than 5 million entries.
So far I've written this but it seems its not right.
SELECT td.names, td.dept_id
FROM tbl_data AS td
LEFT JOIN tbl_user_data AS tud ON td.id = tud.names_id
WHERE tud.user_id !=2
ORDER BY RAND( ) LIMIT 25
Upvotes: 1
Views: 67
Reputation: 108706
Two things:
First ... you need the LEFT JOIN .... IS NULL
pattern to pick out your not-yet-served items. You'll need to mention the user id in the ON clause to get this to work correctly.
SELECT td.names, td.dept_id
FROM tbl_data AS td
LEFT JOIN tbl_user_data AS tud ON td.id = tud.names_id
AND tud.user_id = 2
WHERE tud.id IS NULL
ORDER BY RAND( ) LIMIT 25
Second, ORDER BY RAND() LIMIT ...
is a notoriously poor performer on a large table. It has to select the entire table, then sort it, then discard all except 25 items from it. That's massively wasteful and will never perform decently.
You can make it a little less wasteful by sorting just the id
values, then using them to get the other information.
This gets your 25 random ID values.
SELECT td.id
FROM tbl_data AS td
LEFT JOIN tbl_user_data AS tud ON td.id = tud.names_id
AND tud.user_id = 2
WHERE tud.id IS NULL
ORDER BY RAND( )
LIMIT 25
This gets your names and dept_id values.
SELECT a.names, a.dept_id
FROM tbl_data AS a
JOIN (
SELECT td.id
FROM tbl_data AS td
LEFT JOIN tbl_user_data AS tud ON td.id = tud.names_id
AND tud.user_id = 2
WHERE tud.id IS NULL
ORDER BY RAND( )
LIMIT 25
) b ON a.id = b.id
But, it's still wasteful. You may want to build a randomized version of this tbl_data table, and then use it sequentially. You could re-randomize it once a day, with something like this.
DROP TABLE tbl_data_random;
INSERT INTO tbl_data_random FROM
SELECT *
FROM tbl_data
ORDER BY RAND()
That way you don't do the sort over and over again, just to discard the results. Instead, you randomize once in a while.
Upvotes: 1
Reputation: 67
Create index on names_id and user_id. Why is user_id varchar? If need to be varchar and is varchar very long, create partial index on user_id. You can use EXPLAIN to see what index use your query.
Upvotes: 0
Reputation: 39497
As you're not selecting anything from the tbl_user_data, you can use exists instead:
SELECT td.names, td.dept_id
FROM tbl_data AS td
where exists (
select 1
from tbl_user_data AS tud
where td.id = tud.names_id
and tud.user_id !=2
)
ORDER BY RAND( ) LIMIT 25
Index on tbl_data(id) and tbl_user_data(names_id, user_id) will help.
Upvotes: 0