Reputation: 33
Greenplum 4.2.2.4 (like PostgreSQL 8.2) on server.
I have data like:
id | user
------+------
12345 | bob
12345 | jane
12345 | mary
44455 | user1
44455 | user2
44455 | user3
67890 | bob
53756 | bob
53756 | bob
53756 | bob
25246 | jane
54383 | jane
54383 | jane
54383 | jane
I only want to return rows where the "id" is shared by multiple unique "user" values. However, I'm also querying based on a list of "user" values that I am interested in. For example:
WHERE user IN ('mary','bob','user2')
I want the query to return:
id | user
------+------
12345 | bob
12345 | jane
12345 | mary
44455 | user1
44455 | user2
44455 | user3
How can I do this?
Upvotes: 2
Views: 82
Reputation: 1269973
You can do this using window functions:
select id, user
from (select t.*, min(user) over (partition by id) as minuser,
max(user) over (partition by id) as maxuser
from table t
) t
where minuser <> maxuser;
EDTI: Without window functions (which I thought were around since Postgres 8.1 but I trust Erwin on this matter), you can do the same thing with a join
and group by
:
select t.id, t.user
from table t join
(select user, min(user) as minuser, max(user) as maxuser
from table t
group by user
having min(user) <> max(user)
) tu
on t.user = tu.user;
Upvotes: 1
Reputation: 657022
Postgres 8.2 does not have window functions (introduced with version 8.4).
Since you are looking for rows where
the "id" is shared by multiple unique "user" values.
SELECT t2.id, t2.user
FROM tbl t1
JOIN tbl t2 USING (id) -- retrieve all rows with same id
WHERE t1.user IN ('mary','bob','user2')
AND EXISTS (
SELECT 1
FROM tbl
WHERE id = t1.id
AND user <> t1.user -- at least one other user with same id
)
ORDER BY t2.id, t2.user;
Names are symbolic. One wouldn't use the reserved word as identifier.user
This variant may be faster:
SELECT id, user
FROM (
SELECT id
FROM tbl t1
WHERE user IN ('mary','bob','user2')
AND EXISTS (
SELECT 1
FROM tbl
WHERE id = t1.id
AND user <> t1.user
)
) sub
JOIN tbl USING (id)
ORDER BY id, user;
Either query returns all rows as per your request - including complete duplicates. If you only want distinct rows:
SELECT DISTINCT id, user ...
Upvotes: 0
Reputation: 125284
select id, "user"
from
(
select id
from t
group by id
having
count(distinct "user") > 1
and
array['mary','bob','user2']::varchar(5)[] && array_agg("user")
) s
inner join
t using (id)
order by id, user
Upvotes: 0
Reputation: 44250
CREATE TABLE users( id INTEGER NOT NULL
, username varchar
);
INSERT INTO users (id, username) VALUES
(12345 , 'bob' )
, (12345 , 'jane' )
, (12345 , 'mary' )
, (44455 , 'user1' )
, (44455 , 'user2' )
, (44455 , 'user3' )
, (67890 , 'bob' )
, (53756 , 'bob' )
, (53756 , 'bob' )
, (53756 , 'bob' )
, (25246 , 'jane' )
, (54383 , 'jane' )
, (54383 , 'jane' )
, (54383 , 'jane' )
;
SELECT *
FROM users u1
WHERE EXISTS (
SELECT *
FROM users u2
-- id must at least have one of these three usernames
WHERE u2.username IN ('mary','bob','user2')
AND u2.id = u1.id
AND EXISTS (
SELECT *
FROM users u3
WHERE u3.id = u2.id
-- and there must exist a different username for this id
AND u3.username <> u2.username
)
);
result:
CREATE TABLE
INSERT 0 14
id | username
-------+----------
12345 | bob
12345 | jane
12345 | mary
44455 | user1
44455 | user2
44455 | user3
(6 rows)
Upvotes: 0
Reputation: 189
Try this query out. I tested in a postgresql table with 3.5 million rows and it took roughly 1.7 seconds.
select id,
uname
from (
select
id,
uname,
count(*) over (partition by id,uname) as count_of_unique_id_share,
count(*) over (partition by id) as count_of_id_share
from
(select * from (select distinct id,uname from <TABLE>) z
where id in (select id from <TABLE> where uname in ('mary','bob','user2')))y ) x
where
count_of_unique_id_share = 1 and count_of_id_share > 1
Upvotes: 0
Reputation: 3072
Try this solution:
In t1, the repeated rows that are not unique (like <53756, bob>) are converted to one record.
Then, in the outer parentheses, those id's shared with only one user are filtered (like <25246, Jane> or <53756, bob> that is now converted to one record).
The records with these id's are the answer then:
select *
from OriginalTable
where id in
(
select id
from (
select distinct id, user
from OriginalTable
) as t1
group by id
having count(*) > 1
)
Upvotes: 0