theG
theG

Reputation: 1231

How to select rows if unique?

I'm trying to do a two part query and return a text array of semicolon delimited column values.

First, select only rows that are unique based on values of three columns (i.e. if the three tuple of values exists more than once, it is not unique). Of the remaining rows, perform another filer based on a fourth column.

This is how I think about it, but maybe there's a better solution.

I've tried this a few different ways. My current attempt is using CTE:

with uniqe as (
    select distinct on (
        col1,
        col2,
        col3
    ) *
    from MyTable
)
select concat(col::text, ';', col2::text, ';', col3)
    as key
    from uniqe
    where upper(dateRange) <= (now() - interval '1 days')
    order by key;

The issue I'm running into is that SELECT DISTINCT ON (col1, col2, col3) ... seems to pick at least 1 row of the others that I'm not considering "unique".

Just to be clear, here's a sample table:

 id  | col1 | col2 | col3 |                       dateRange
-----+------+------+------+-------------------------------------------------------
  1  |   1  |   1  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  2  |   1  |   1  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  3  |   1  |   1  |  B   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  4  |   1  |   2  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  5  |   2  |   1  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  6  |   2  |   1  |  A   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  7  |   1  |   2  |  B   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")
  8  |   1  |   2  |  B   |   ["2018-12-31 16:01:40-08","2018-12-31 16:03:20-08")

I consider rows 3 & 4 to be the only unique rows.

Upvotes: 0

Views: 595

Answers (1)

unutbu
unutbu

Reputation: 879093

Essentially, the problem boils down to selecting rows based on the value of a aggregate or window function. Thus the solutions here are applicable, except than in our case we want count(*) to equal 1.

Thus, we could use the WHERE IN method:

WITH uniqe AS (
    SELECT *
    FROM MyTable
    WHERE (col1, col2, col3) IN (
        SELECT col1, col2, col3
        FROM MyTable
        GROUP BY col1, col2, col3
        HAVING count(*) = 1
    ) AS t
)    

or the PARTITION BY method:

WITH uniqe AS (
    SELECT *
    FROM (
        SELECT col1, col2, col3, dateRange
            , count(*) OVER (PARTITION BY col1, col2, col3) AS cnt
        FROM MyTable
    ) AS t
    WHERE cnt = 1
)    

As Andomar explains, PARTITION BY is similar to GROUP BY in the way it affects how the window function's result is calculatiod, but unlike GROUP BY, it does not affect the number of rows returned.

Upvotes: 1

Related Questions