Reputation: 1953
I have a table Hobby, whose snippet is as follows:
Name Activity Hours
John Hiking .5
Sam Cycling .5
Sam Swimming 1
Sam Hiking .5
John Running 1
Sam Sailing 1
For every person X in (X, Y), I would like to find the sum of hours of activities where X and Y don't have in common. For example, if John = X and Sam = Y, then it would yield 1, since Running is the only activity John has that Sam doesn't.
My code is as follows:
select a.Name, b.Name, sum(a.Hours)
from Hobby a, Hobby b
where a.Name <> b.Name and a.Activity <> b.Activity
group by a.Name, b.Name;
However, this gave me a wrong answer. What is wrong with my code?
Upvotes: 2
Views: 3973
Reputation: 10976
Think of the underlying results before the group by. Consider only rows where table a is John, Hiking
. In your query you would have
John Hiking 0.5 John Hiking 0.5
John Hiking 0.5 Sam Cycling 0.5
John Hiking 0.5 Sam Swimming 1
John Hiking 0.5 Sam Hiking 0.5
John Hiking 0.5 John Running 1
John Hiking 0.5 Sam Sailing 1
With your where clause, you'd remove the John rows and the Hiking rows from table b leaving:
John Hiking 0.5 Sam Cycling 0.5
John Hiking 0.5 Sam Swimming 1
John Hiking 0.5 Sam Sailing 1
So you're going to be counting these hours three times in the John, Sam
set.
Here's a way to adjust it, while still only using each table once:
Select
a.Name,
b.Name Name2,
Sum(a.Hours) / count(distinct b.activity)
- Sum(case when a.Activity = b.Activity then a.Hours else 0 end) as Hours
From
Hobby a,
Hobby b
Where
a.Name != b.Name
Group By
a.Name,
b.Name
Upvotes: 0
Reputation: 52336
If you take the list of people and activities, and join to it a list of all other people, and test whether the other people also take part in that activity, it ought to do the trick.
with
cte_unique_names as (
select distinct name
from hobby)
select
h.name participates,
n.name does_not_participate,
sum(hours) hours
from
hobby h
cross join
cte_unique_names n
where
n.name != h.name and
not exists (
select null
from hobby h2
where h2.name = n.name and
h2.activity = h.activity)
group by
h.name,
n.name
Upvotes: 0
Reputation: 2006
your from clause reads
FROM Hobby a, Hobby b
Putting a comma in the from clause means "CROSS JOIN" which means each row in the first table are correlated with every row in the second table. Given your where clause I would think this gives some pretty big numbers.
your query needs to be a bit different:
select sum(hours)
from hobby
where name = 'John'
and activity not in (
select activity
from hobby
where name = 'Sam'
)
Upvotes: 0
Reputation: 1269553
I find this to be a tricky question. My original approach was going to use a full outer join
. But then I realized that if there is no match on the activity in one name, then I'm not going to have the name either.
So, the following query works by getting a list of all pairs of names. This is an ordered list, so a given pair of names only appears once. Then this is joined to the Hobby
table twice, using left outer join
to get the matches. The key, though, is that when there is no match, the row with Activity
on it is still present, but with a NULL
value.
The where
clause finds all Activity
s that have a NULL
in either table. These are the ones that don't match. Then it is a simple matter of just adding up the hours:
select names.Name1, names.Name2, sum(coalesce(h1.hours, h2.hours))
from (select distinct h1.Name as name1, h2.Name as name2
from Hobby h1 cross join Hobby h2
where h1.Name < h2.Name
) names left outer join
Hobby h1
on names.name1 = h1.name left outer join
Hobby h2
on names.name2 = h2.name and
h1.Activity = h2.Activity
where h1.Activity is null or h2.Activity is null
group by names.Name1, names.Name2;
Upvotes: 4