August Lilleaas
August Lilleaas

Reputation: 54593

How to use SQL to count the occurrence of groups and treat two different columns as a unsorted set?

Given the table:

CREATE TABLE foo (
    thing_a VARCHAR2(15),
    thing_b VARCHAR2(15)
);

And the rows

INSERT INTO foo (thing_a, thing_b) VALUES 'red', 'green';
INSERT INTO foo (thing_a, thing_b) VALUES 'red', 'green';
INSERT INTO foo (thing_a, thing_b) VALUES 'green', 'red';
INSERT INTO foo (thing_a, thing_b) VALUES 'red', 'blue';

I would like to get the result

green,red: 3
blue,red: 1

In other words, I would like to count the occurrence of each group of thing_a and thing_b, but I want to consider "red" and "green" to be the same as "green" and "red".

The closest I've come to a "solution" is this:

SELECT DISTINCT (
    CASE WHEN thing_a < thing_b
    THEN thing_a || thing_b
    ELSE thing_b || thing_a END
) as thing FROM foo;

Which yields:

greenred
bluered

So it does the grouping correctly, but it doesn't count anything.

Any suggestions for how to solve this?

Upvotes: 0

Views: 43

Answers (2)

jarlh
jarlh

Reputation: 44766

You're close, almost there. Simply wrap up the case query in a derived table and then GROUP BY its result:

select thing, count(*)
from
(
    SELECT CASE WHEN thing_a < thing_b THEN thing_a || thing_b
           ELSE thing_b || thing_a
           END as thing
    FROM foo
) dt
group by thing

Too make sure no items like ab and cde and confused with abcd and e, add a comma between the columns. (Thanks to mathguy for pointing this out!)

select thing, count(*)
from
(
    SELECT CASE WHEN thing_a < thing_b THEN thing_a || ', ' || thing_b
           ELSE thing_b || ', ' || thing_a
           END as thing
    FROM foo
) dt
group by thing

Alternatively, do a UNION ALL in a derived table, and GROUP BY its result:

select ta, tb, count(*)
from
(
    select thing_a as ta, thing_b as tb from foo where thing_a <= thing_b
    union all
    select thing_b, thing_a from foo where thing_a > thing_b
) dt
group by ta, tb

ANSI SQL compliant and portable!

Upvotes: 1

user5683823
user5683823

Reputation:

This can be all done in a single SELECT statement - group by least value, greatest value. Note: This will not work if there may be null in either column; if that is a possibility, then the query must be written with a little more care to handle it.

with
     foo ( thing_a, thing_b ) as (
       select 'red'  , 'green' from dual union all
       select 'red'  , 'green' from dual union all
       select 'green', 'red'   from dual union all
       select 'red'  , 'blue'  from dual union all
       select 'ab'   , 'cde'   from dual union all
       select 'abcd' , 'e'     from dual union all
       select 'cde'  , 'ab'    from dual
     )
-- end of test data; SQL query begins BELOW THIS LINE
select least(thing_a, thing_b) as thing_a, greatest(thing_a, thing_b) as thing_b,
       count(*) as cnt
from   foo
group by least(thing_a, thing_b), greatest(thing_a, thing_b)
order by thing_a, thing_b   --   if needed
;

THING_A THING_B CNT
------- ------- ---
ab      cde       2
abcd    e         1
blue    red       1
green   red       3

4 rows selected.

Upvotes: 1

Related Questions