JohanB
JohanB

Reputation: 376

Count Differences between rows in SQL table

I have the following SQL Server table:

  Group  |  SubGroup  |  Value
----------------------------------
  D934   |     A      |   100
  D934   |     A      |   101
  D934   |     A      |   102
  D934   |     B      |   100
  D934   |     B      |   103
  D934   |     C      |   102
  D934   |     C      |   105
  D955   |     A      |   100
  D955   |     A      |   103
  D955   |     B      |   101
  D955   |     B      |   102
  D955   |     B      |   103
  D955   |     D      |   101
  D955   |     D      |   103

I would like count the differences between SubGroup A and B, A and C... for each Group. For example for D934, A and B have 3 differences (Values 101, 102 and 103).

Result:

  Group  |  SubGroup  |  SubGroup To Compare  |  Differences
-------------------------------------------------------------
  D934   |     A      |          A            |      0
  D934   |     A      |          B            |      3
  D934   |     A      |          C            |      3
  D934   |     B      |          A            |      3
  D934   |     B      |          B            |      0
  D934   |     B      |          C            |      4
  D934   |     C      |          A            |      3
  D934   |     C      |          B            |      4
  D934   |     C      |          C            |      0
  D955   |     A      |          A            |      0
  D955   |     A      |          B            |      3
  D955   |     A      |          D            |      2
  D955   |     B      |          A            |      3
  D955   |     B      |          B            |      0
  D955   |     B      |          D            |      1
  D955   |     D      |          A            |      2
  D955   |     D      |          B            |      1
  D955   |     D      |          D            |      0

I would like to achieve the result table in a single SQL query. Can you advise?

Upvotes: 2

Views: 1044

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269503

This is a complicated problem. You can do this calculation with a self join and counting. The key counts are:

  • How many values match between the two groups.
  • How many unique values in each group.

The difference is then the sum of the unique values in each group minus twice the first value -- that is because matching values are counted twice.

So, the query is:

select t1.grp, t1.subgrp, t2.subgrp,
       (case when t1.subgrp = t2.subgrp then 0
             else count(distinct t1.val) +
                  count(distinct t2.val) -
                  2 * sum(case when t1.val = t2.val then 1 else 0 end)
        end) as diff
from t t1 join
     t t2
     on t1.grp = t2.grp 
group by t1.grp, t1.subgrp, t2.subgrp
order by 1, 2, 3;

Here is a db<>fiddle.

Upvotes: 2

Related Questions