Travis
Travis

Reputation: 33

How to query for values that share another value in common

Greenplum 4.2.2.4 (like PostgreSQL 8.2) on server.

I have data like:

id    | user
------+------
12345 | bob
12345 | jane
12345 | mary
44455 | user1
44455 | user2
44455 | user3
67890 | bob
53756 | bob
53756 | bob
53756 | bob
25246 | jane
54383 | jane
54383 | jane
54383 | jane

I only want to return rows where the "id" is shared by multiple unique "user" values. However, I'm also querying based on a list of "user" values that I am interested in. For example:

WHERE user IN ('mary','bob','user2')

I want the query to return:

id    | user
------+------
12345 | bob
12345 | jane
12345 | mary
44455 | user1
44455 | user2
44455 | user3

How can I do this?

Upvotes: 2

Views: 82

Answers (6)

Gordon Linoff
Gordon Linoff

Reputation: 1269973

You can do this using window functions:

select id, user
from (select t.*, min(user) over (partition by id) as minuser,
             max(user) over (partition by id) as maxuser
      from table t
     ) t
where minuser <> maxuser;

EDTI: Without window functions (which I thought were around since Postgres 8.1 but I trust Erwin on this matter), you can do the same thing with a join and group by:

select t.id, t.user
from table t join
     (select user, min(user) as minuser, max(user) as maxuser
      from table t
      group by user
      having min(user) <> max(user)
     ) tu
     on t.user = tu.user;

Upvotes: 1

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657022

Postgres 8.2 does not have window functions (introduced with version 8.4).
Since you are looking for rows where

the "id" is shared by multiple unique "user" values.

SELECT t2.id, t2.user
FROM   tbl t1
JOIN   tbl t2 USING (id)    -- retrieve all rows with same id
WHERE  t1.user IN ('mary','bob','user2')
AND    EXISTS (
   SELECT 1
   FROM   tbl
   WHERE  id = t1.id
   AND    user <> t1.user   -- at least one other user with same id
   )
ORDER  BY t2.id, t2.user;

Names are symbolic. One wouldn't use the reserved word user as identifier.

This variant may be faster:

SELECT id, user
FROM (
    SELECT id
    FROM   tbl t1
    WHERE  user IN ('mary','bob','user2')
    AND    EXISTS (
        SELECT 1
        FROM   tbl
        WHERE  id = t1.id
        AND    user <> t1.user
        )
    ) sub
JOIN   tbl USING (id)
ORDER  BY id, user;

Either query returns all rows as per your request - including complete duplicates. If you only want distinct rows:

SELECT DISTINCT id, user ...

Upvotes: 0

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125284

SQL Fiddle

select id, "user"
from
    (
        select id
        from t
        group by id
        having
            count(distinct "user") > 1
            and
            array['mary','bob','user2']::varchar(5)[] && array_agg("user")
    ) s
    inner join
    t using (id)
order by id, user

Upvotes: 0

wildplasser
wildplasser

Reputation: 44250

CREATE TABLE users( id INTEGER NOT NULL
        , username varchar
        );

INSERT INTO users (id, username) VALUES
  (12345 , 'bob' )
, (12345 , 'jane' )
, (12345 , 'mary' )
, (44455 , 'user1' )
, (44455 , 'user2' )
, (44455 , 'user3' )
, (67890 , 'bob' )
, (53756 , 'bob' )
, (53756 , 'bob' )
, (53756 , 'bob' )
, (25246 , 'jane' )
, (54383 , 'jane' )
, (54383 , 'jane' )
, (54383 , 'jane' )
        ;

SELECT *
FROM users u1
WHERE EXISTS (
        SELECT *
        FROM users u2
        -- id must at least have one of these three usernames
        WHERE u2.username IN ('mary','bob','user2')
        AND u2.id = u1.id
        AND EXISTS (
                SELECT *
                FROM users u3
                WHERE u3.id = u2.id
                -- and there must exist a different username for this id
                AND u3.username <> u2.username
                )
        );

result:

CREATE TABLE
INSERT 0 14
  id   | username 
-------+----------
 12345 | bob
 12345 | jane
 12345 | mary
 44455 | user1
 44455 | user2
 44455 | user3
(6 rows)

Upvotes: 0

Senthil
Senthil

Reputation: 189

Try this query out. I tested in a postgresql table with 3.5 million rows and it took roughly 1.7 seconds.

select id,
       uname 
from   (
    select 
           id,
           uname,
           count(*) over (partition by id,uname) as count_of_unique_id_share,
           count(*) over (partition by id) as count_of_id_share 
    from 
           (select * from (select distinct id,uname from <TABLE>) z 
        where  id in (select id from <TABLE> where uname in ('mary','bob','user2')))y ) x 
where 
        count_of_unique_id_share = 1 and count_of_id_share > 1

Upvotes: 0

Alisa
Alisa

Reputation: 3072

Try this solution:

In t1, the repeated rows that are not unique (like <53756, bob>) are converted to one record.

Then, in the outer parentheses, those id's shared with only one user are filtered (like <25246, Jane> or <53756, bob> that is now converted to one record).

The records with these id's are the answer then:

select *
    from OriginalTable
    where id in 
        (
        select id 
            from ( 
                select distinct id, user
                    from OriginalTable
                ) as t1
            group by id
            having count(*) > 1
        )

Upvotes: 0

Related Questions