jpm
jpm

Reputation: 1083

Sql efficient query from multiple tables

I've two tables tbl_data and tbl_user_data

Structure of tbl_data

id (int) (primary)
names (varchar)
dept_id (int)

Structure of tbl_user_data:

id (int) (primary)
user_id (int)
names_id (int)

tbl_data.id and tbl_user_data.names_id are foreign key

I've situation where I've to pick 25 random entries from tbl_data which is not served earlier to particular user. So I've created a tbl_user_data which will store user_id and names_id(from tbl_data which is already served). I'm bit confused, how to query on behalf of this or is there any other way to do this efficiently ?

Note: tbl_data have more than 5 million entries.

So far I've written this but it seems its not right.

SELECT td.names, td.dept_id
FROM tbl_data AS td
LEFT JOIN tbl_user_data AS tud ON td.id = tud.names_id
WHERE tud.user_id !=2
ORDER BY RAND( ) LIMIT 25

Upvotes: 1

Views: 67

Answers (3)

O. Jones
O. Jones

Reputation: 108706

Two things:

First ... you need the LEFT JOIN .... IS NULL pattern to pick out your not-yet-served items. You'll need to mention the user id in the ON clause to get this to work correctly.

SELECT td.names, td.dept_id
  FROM tbl_data AS td 
  LEFT JOIN tbl_user_data AS tud    ON td.id = tud.names_id
                                   AND tud.user_id = 2
 WHERE tud.id IS NULL
 ORDER BY RAND( ) LIMIT 25

Second, ORDER BY RAND() LIMIT ... is a notoriously poor performer on a large table. It has to select the entire table, then sort it, then discard all except 25 items from it. That's massively wasteful and will never perform decently.

You can make it a little less wasteful by sorting just the id values, then using them to get the other information.

This gets your 25 random ID values.

                 SELECT td.id
                   FROM tbl_data AS td 
                   LEFT JOIN tbl_user_data AS tud    ON td.id = tud.names_id
                                                    AND tud.user_id = 2
                  WHERE tud.id IS NULL
                  ORDER BY RAND( )
                  LIMIT 25

This gets your names and dept_id values.

  SELECT a.names, a.dept_id
    FROM tbl_data AS a
    JOIN (
                 SELECT td.id
                   FROM tbl_data AS td 
                   LEFT JOIN tbl_user_data AS tud    ON td.id = tud.names_id
                                                    AND tud.user_id = 2
                  WHERE tud.id IS NULL
                  ORDER BY RAND( )
                  LIMIT 25
         ) b ON a.id = b.id

But, it's still wasteful. You may want to build a randomized version of this tbl_data table, and then use it sequentially. You could re-randomize it once a day, with something like this.

 DROP TABLE tbl_data_random;
 INSERT INTO tbl_data_random FROM
 SELECT * 
   FROM tbl_data
  ORDER BY RAND()

That way you don't do the sort over and over again, just to discard the results. Instead, you randomize once in a while.

Upvotes: 1

kole90s
kole90s

Reputation: 67

Create index on names_id and user_id. Why is user_id varchar? If need to be varchar and is varchar very long, create partial index on user_id. You can use EXPLAIN to see what index use your query.

Upvotes: 0

Gurwinder Singh
Gurwinder Singh

Reputation: 39497

As you're not selecting anything from the tbl_user_data, you can use exists instead:

SELECT td.names, td.dept_id
FROM tbl_data AS td
where exists (
    select 1
    from tbl_user_data AS tud 
    where td.id = tud.names_id
    and tud.user_id !=2
)
ORDER BY RAND( ) LIMIT 25

Index on tbl_data(id) and tbl_user_data(names_id, user_id) will help.

Upvotes: 0

Related Questions