Tom Pickles
Tom Pickles

Reputation: 900

Most recent entry from two tables

I have a SQL 2000 DB with and old table and a new table with combined records of over 20,000,000 records. The two tables are exactly the same, but were split due to performance issues. I am not the DB admin, I just need data out of it and have been given DBReader rights on it.

OldTable: ClientID, AppID, ModTime, Event

NewTable: ClientID, AppID, ModTime, Event

I need to retrieve the most recent record for each client, appid and event from whichever table has the most recent entry for it. Anyone any ideas about the best method for this? I have tried using a union, but the query takes over two hours to complete. I was thinking of using a join instead, but I'm not sure the best approach.

Thanks!

Upvotes: 1

Views: 325

Answers (5)

user359040
user359040

Reputation:

For performance, I suggest inserting ClientID, AppID, and MAX(ModTime) from the old table into a temporary table, appending ClientID, AppID, and MAX(ModTime) from the new table into the same temporary table and then querying ClientID, AppID, and MAX(ModTime) from the temporary table.

Upvotes: 0

littlegreen
littlegreen

Reputation: 7420

If this is just a one-off job and you only have two tables, just run a 'most-recent-entry' query on the two tables separately. Then do a UNION ALL of the two resultsets and use GROUP BY and MAX to leave only the most recent. In SQL:

SELECT ClientID, AppID, Event, MAX(MaxModTime) FROM (
    SELECT ClientID, AppID, Event, MAX(ModTime) MaxModTime FROM table1
    GROUP BY ClientID, AppID, Event
    UNION ALL
    SELECT ClientID, AppID, Event, MAX(ModTime) MaxModTime FROM table2
    GROUP BY ClientID, AppID, Event
) Q
GROUP BY ClientID, AppID, Event

You can improve the speed of such a query by having an composite index on (ClientID, AppID, Event) for both tables, or when it is possible a clustered index on (ClientID, AppId, Event, ModTime).

Upvotes: 0

Damien_The_Unbeliever
Damien_The_Unbeliever

Reputation: 239664

If you're using a plain "UNION", then that could cause issues. UNION ensures that it's output contains no duplicates, which generally requires sorting or hashing the entire dataset.

UNION ALL, on the other hand, just returns all rows from both sides.

Upvotes: 0

Adriaan Stander
Adriaan Stander

Reputation: 166396

you will have to use a UNION, but if the tables are DISTINCT, consider using a UNION ALL which will be faster.

Also ensure that you have the correct indexes on the tables for this kind of query.

Upvotes: 2

Alex Brown
Alex Brown

Reputation: 42872

why not perform the query on each table, union the results, and repeat the query on the union?

Upvotes: 1

Related Questions