Sjoerd
Sjoerd

Reputation: 84

Compare large MySQL tables

I need to make a comparison between two (or more) tables with around 60.000 rows and about 60 columns.

In these tables there are two values on which I want to run a query. The purpose of the query is to count the rows which exists in TABLE_A but don't exist in TABLE_B based on two values in the row.

I've ran the following query:

SELECT id
FROM table_a ta
WHERE NOT EXISTS (
  SELECT id
  FROM table_b tb
  WHERE ta.value1=tb.value1 AND ta.value2=tb.value2
)

As said, I've tried the code above and some variations on it. But to run this query it takes ages before it's finished. I hope to find a solution which runs in under 10 seconds.

Next query I tried, and of which I thought was working:

SELECT value1, value2
FROM (
    SELECT ta.value1, ta.value2
    FROM table_a ta
    UNION ALL
    SELECT tb.value1, tb.value2
    FROM table_b tb
) result
GROUP BY value1, value2
HAVING COUNT(*) = 1
ORDER BY value1

The code shows me all differences between the two tables. So if valueX exists in TABLE_A but not in TABLE_B it's shown and vice versa.

So in short, I want to get all rows from TABLE_A which are not present in TABLE_B based on two values in the row.

Hope someone can help, thanks!

Upvotes: 1

Views: 181

Answers (2)

Sjoerd
Sjoerd

Reputation: 84

After some trial and error I have improved the second block of code. I noticed an additional field in my table which I could use to further filter the results.

SELECT date, value1, value2
FROM (
    SELECT date, value1, value2
    FROM (
        SELECT ta.date, ta.value1, ta.value2
        FROM table_1 ta
        UNION ALL
        SELECT tb.date, tb.value1, tb.value2
        FROM table_2 tb
    ) filter
    GROUP BY value1, value2
    HAVING COUNT(*) = 1
) result
WHERE date='YYYY-MM-DD'

This code filters the results in under 4 seconds.

Anyway, thanks everyone for the trouble.

Upvotes: 0

Tschallacka
Tschallacka

Reputation: 28732

Why not use a join?

/* Create a table called NAMES */
CREATE TABLE NAMES(Id integer PRIMARY KEY, Name text, LastName text);
CREATE TABLE OTHERNAMES(Id integer PRIMARY KEY, Name text, LastName text);

/* Create few records in this table */
INSERT INTO NAMES VALUES(1,'Tom','Riddle');
INSERT INTO NAMES VALUES(2,'Lucy','I love');
INSERT INTO NAMES VALUES(3,'Frank','Frankly');
INSERT INTO NAMES VALUES(4,'Jane','Austen');
INSERT INTO NAMES VALUES(5,'Robert','Downey');

INSERT INTO OTHERNAMES VALUES(2,'Lucy','I love');
INSERT INTO OTHERNAMES VALUES(3,'Frank','Frankly');
INSERT INTO OTHERNAMES VALUES(4,'Jane','Austen');
INSERT INTO OTHERNAMES VALUES(5,'Robert','Downey');

select * from NAMES
LEFT JOIN OTHERNAMES on 
    NAMES.Name = OTHERNAMES.Name 
    AND Names.LastName = OTHERNAMES.LastName
where OTHERNAMES.id is null

See it online http://sqlfiddle.com/#!9/640c53/1

If you use a LEFT JOIN Items that don't exist in the right table will be replaced with null entries, which can be filtered with a where.

I don't know how efficient that is with your 60.000 database but this usually does the trick for me.

Upvotes: 3

Related Questions