Reputation: 7605
I need to create a query which selects from a particular table the users which have more than one different email. To distinguish users, I group them based on two fields: name and age. Let's see this with an example.
So I have a table like this:
name age email phone
----------------------------------
Andy 20 Andy@du 1234
Berni 21 Berni@du 2345
Carol 22 Carol@du 3456
Andy 20 Andy@du 4321
Berni 21 Berni@et 2345
Dody 28 Dodi@du 7869
Carol 22 Carol@pt 3456
What I want to get is:
Berni 21 Berni@du, Berni@et
Carol 22 Carol@du, Carol@pt
Note that Andy is also twice in the database but with same email (what changes is the phone number). Because of this user I need to make a distinc over email, so only users with two different emails are selected.
With this query I am able to solve the issue and I have the desired result.
select * from
(
select aux.name,
aux.age,
concat_ws(',',collect_set(email)) as email
FROM
(select a.name, a.age, a.email
FROM TestUsers a
RIGHT JOIN
(select name,
age
FROM TestUsers
GROUP BY
name,
age
having count(*) > 1
)b
ON a.name = b.name
AND a.age = b.age
)aux
GROUP BY aux.name,
aux.age
)tr
where locate(",",tr.email) > 0;
But I am sure it has to be a more efficient way than checking when there is not a comma in the email field(which means more than one email).
Has anyone in mind a better approach?
Upvotes: 2
Views: 809
Reputation: 1269773
If I understand correctly, you should be able to do this using a having
clause:
select tu.name, tu.age,
concat_ws(',', collect_list(tu.email)) as emails
from (select distinct tu.name, tu.age, tu.email
from TestUsers tu
) tu
group by tu.name, tu.age
having count(*) > 1;
Actually, because collect_set()
removes duplicates, this should work without a subquery:
select tu.name, tu.age,
concat_ws(',', collect_set(tu.email)) as emails
from testusers tu
group by tu.name, tu.age
having min(tu.email) <> max(tu.email);
Upvotes: 2