Anthony
Anthony

Reputation: 35928

How to find count differences for IDs in two large tables

I have two large tables. Both containing around 17M rows each. They should have same exact number of rows but I am finding that the counts are different by 343. I want to find out where the counts are different. Tables look like this:

Table A

ID | color
---| ---------
1  | red
1  | green
1  | blue
2  | white
3  | black
3  | red

Tabls B

ID | sale_dates
---| ----------
1  | 2020-10-01
1  | 2020-01-10
2  | 2018-01-09
3  | 2017-08-08

Based on above I would like an output like below:

ID | Table A | Table B | Difference
---| --------| --------| ----------
1  | 5       | 2       | 3
2  | 1       | 1       | 0
3  | 2       | 1       | 1

Or even only find out the ones where the difference is not 0

Upvotes: 0

Views: 172

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269563

This is a tweak on Nick's answer. I think a full join is very important in this type of situation, because it is possible that some ids are missing from one table or the other:

SELECT ID, a.cnt, b.cnt,
       (COALESCE(a.cnt, 0) - COALESCE(b.cnt, 0)) as difference
FROM (SELECT UPPER(ID) as id, COUNT(*) AS cnt
      FROM A
      GROUP BY UPPER(ID)
     ) A FULL JOIN
     (SELECT UPPER(ID) as id, COUNT(*) AS cnt
      FROM B
      GROUP BY UPPER(ID)
     ) B 
     USING (ID)
ORDER BY difference DESC;

Add:

WHERE COALESCE(a.cnt, 0) <> COALESCE(b.cnt)

if you only want ids where the counts are not the same.

Upvotes: 1

Nick
Nick

Reputation: 147146

If the two tables will always have the same set of ID values, you can just JOIN two derived tables of COUNT(*) values to get your desired output:

SELECT A.ID,
       "Table A",
       "Table B",
       "Table A" - "Table B" AS Difference
FROM (
  SELECT ID, COUNT(*) AS "Table A"
  FROM A
  GROUP BY ID
) A
JOIN (
  SELECT ID, COUNT(*) AS "Table B"
  FROM B
  GROUP BY ID
) B ON A.ID = B.ID
ORDER BY A.ID

Output:

id  Table A     Table B     difference
1   3           2           1
2   1           1           0
3   2           1           1

Demo on dbfiddle

If you only want the ID values which have a non-zero difference, add

WHERE "Table A" - "Table B" > 0

before the ORDER BY clause.

Demo on dbfiddle

Upvotes: 3

Related Questions