SQL/Vertica - grouping multi-attribute combinations

Question

I have the following type of dataset:

user_id   country1  city1      country2  city2
1         usa       new york   france    paris 
2         usa       dallas     japan     tokyo 
3         india     mumbai     italy     rome 
4         france    paris      usa       new york 
5         brazil    sao paulo  russia    moscow

I want to group the combinations of country1, city1, country2 and city2 where the order (is something country1 or country2) should not matter. Usually, I would try:

SELECT   country1 
       , city1
       , country2
       , city2 
       , COUNT(*) 
FROM dataset
GROUP BY country1 
       , city1
       , country2
       , city2

However, this code snippet considers the rows with user_id=1 and user_id=4 as two separate cases where I would like them to be considered as equivalent.

Anyone who knows how to coop with this problem?

Thanks in advance!

Gordon Linoff · Accepted Answer

Normally, you approach this type of problem using least() and greatest(), but you have two columns, instead of one. So, let's do it by comparing cities. I am guessing that city is more unique than country:

select (case when city1 < city2 then country1 else country2 end) as country1,
       (case when city1 < city2 then city1 else city2 end) as city1,
       (case when city1 < city2 then country2 else country1 end) as country2,
       (case when city1 < city2 then city2 else city1 end) as city2,
       count(*)
from dataset
group by (case when city1 < city2 then country1 else country2 end),
       (case when city1 < city2 then city1 else city2 end),
       (case when city1 < city2 then country2 else country1 end),
       (case when city1 < city2 then city2 else city1 end)

SQL/Vertica - grouping multi-attribute combinations

Answers (1)

Related Questions