Clint_A
Clint_A

Reputation: 536

PostgreSQL filter group by individual values

I have a query that returns data as shown;

name | field | count_1 | count_2 |
-----|-------|---------|---------|
John |  aaa  |    3    |    3    |
John |  bbb  |    3    |    3    |
John |  ccc  |    3    |    3    |
John |  ddd  |    1    |    1    |
Dave |  aaa  |    3    |    3    |
Dave |  bbb  |    3    |    3    |
Dave |  ccc  |    3    |    3    |
Dave |  ddd  |    3    |    3    |
-----|-------|---------|---------|

I need to filter this data by the counts where count_1 and count_2 are =3. In the above case, for John on field ddd both counts don't meet the condition so the query should return Dave only regardless of the other conditions met by John on the other fields. How can I achieve this?

As long as a single count is not met by an individual on the given fields, he should be filtered out.

Upvotes: 2

Views: 9159

Answers (3)

klin
klin

Reputation: 121494

Use the boolean aggregate bool_and() in the having clause to get names meetign the conditions:

select name
from the_data
group by 1
having bool_and(count_1 = 3 and count_2 = 3)

 name 
------
 Dave
(1 row)

You can use the above as a subquery to filter and return original rows (if you need this):

select *
from the_data
where name in (
    select name
    from the_data
    group by 1
    having bool_and(count_1 = 3 and count_2 = 3)
    )

 name | field | count_1 | count_2 
------+-------+---------+---------
 Dave | aaa   |       3 |       3
 Dave | bbb   |       3 |       3
 Dave | ccc   |       3 |       3
 Dave | ddd   |       3 |       3
(4 rows)    

Upvotes: 4

Gordon Linoff
Gordon Linoff

Reputation: 1269463

I think you want:

with t as (
      <your query here>
     )
select t.*
from (select t.*,
             count(*) filter (where count_1 <> 3) over (partition by name) as cnt_1_3,
             count(*) filter (where count_2 <> 3) over (partition by name) as cnt_2_3
      from t
     ) t
where cnt_1_3 = 0 and cnt_2_3 = 0;

If you don't want the original rows, I would go for aggregation:

select name
from t
group by name
having min(count_1) = max(count_1) and min(count_1) = 3 and
       min(count_2) = max(count_2) and min(count_2) = 3;

Or you can phrase this as:

having sum( (count_1 <> 3)::int ) = 0 and
       sum( (count_2 <> 3)::int ) = 0

Note that all the above assume the counts are not NULL (which seems reasonable for something called a count). You can use NULL-safe comparison (is distinct from) if NULL values are a possibility.

Upvotes: 2

sticky bit
sticky bit

Reputation: 37472

If I got this right, a NOT EXISTS might help you.

SELECT *
       FROM (<your query>) x
       WHERE NOT EXISTS (SELECT *
                                FROM (<your query) y
                                WHERE y.name = x.name
                                      AND (y.count_1 <> 3
                                           OR y.count_2 <> 3));

Replace <your query> with your query, that gave you the posted result (or use a CTE for that but be aware, that this may cause performance issues in Postgres).

Maybe there is a more elegant solution, that already "short cuts" into your query, but to find such would require more information about your schema and the current query.

Upvotes: 4

Related Questions