yuk
yuk

Reputation: 19870

SQL for mySQL: How to select only data if value in one column exist in all categories from another column

One column in my data is sample, another is category. Duplicate are allowed. Number of unique categories I currently have is 5. Here is a simplified example:

sample category other_columns
122    a
123    a
124    a
125    a
123    b
124    b
125    b
122    c
123    c
124    c
...    ...

I need to select only those samples that exist in all categories (122 is not in 'b' and 125 is not in 'c').

sample category
123    a
124    a
123    b
124    b
123    c
124    c

So, if I run

SELECT category, COUNT(DISTINCT sample, category)
FROM my_table
GROUP BY category

all counts should be the same.

Upvotes: 2

Views: 3972

Answers (4)

slaakso
slaakso

Reputation: 9050

Try thinking the query like this:

"Find me all samples that do not belong to a group of samples that are missing some of the existing categories" -> find all samples that have all the categories.

select m.sample, m.category
from my_table m
where not exists (
  select *
  from my_table m2
  where not exists (
    select *
    from my_table m3
    where m.sample = m3.sample and m2.category = m3.category
  )
)
order by m.category, m.sample;

Upvotes: 1

Sparky
Sparky

Reputation: 15085

Try this

select sample,category
from my_table
group by sample,category
having count(*)=5

Actually, the above query is wrong, I misunderstood the question. Try this instead.

select sample,count(distinct category)
from my_table
group by sample
having count(distinct category)>=5

Upvotes: 3

tscho
tscho

Reputation: 2054

This one should do the trick:

SELECT sample 
FROM my_table 
GROUP BY sample
HAVING count(DISTINCT category) >= (SELECT count(DISTINCT category) FROM my_table);

Upvotes: 1

Ben
Ben

Reputation: 52853

You have a table categories ( category, category_description, etc... ) and a foreign key constraint in sample so all your samples must have a category. Then your query is:

select sample
  from ( select distinct sample, category from my_table )
 group by sample
having count(*) = ( select count(*) 
                      from categories )         

This means that you can change the number of categories and it makes no difference. You know exactly what categories each sample has as they're all the ones in categories.

EDIT Taking into account @ajreal's comment...

Upvotes: 0

Related Questions