nittyjee
nittyjee

Reputation: 389

Select distinct on multiple columns simultaneously, and keep one column in PostgreSQL

For a table such as this:

tblA  

A,B,C  
1,2,t3a  
1,3,d4g  
1,2,b5e  
1,3,s6u 

I want to produce a table that selects distinct on both A and B simultaneously, and still keep one value of C, like so:

tblB  

A,B,C  
1,2,t3a  
1,3,d4g 

Seems like this would be simple, but not finding it for the life of me.

DROP TABLE IF EXISTS tblA CASCADE;  
SELECT DISTINCT ON (A,B), C  
INTO tblB  
FROM tblA;  

Upvotes: 9

Views: 24460

Answers (5)

mohamadi_arch
mohamadi_arch

Reputation: 169

According to postgresql documentation, you need to specify the names, inside and outside of prantenses.

SELECT DISTINCT ON (A,B) A,B,C  
INTO tblB  
FROM tblA;

The image below is another example, with fake data from my project that demonstrates that the intended syntax works properly.

My case

Upvotes: 0

TomoMiha
TomoMiha

Reputation: 1279

drop table t;
create table t (a int, b int, c int);

insert into t (a, b, c) values(1, 2, 3);
insert into t (a, b, c) values(1, 3, 4);
insert into t (a, b, c) values(1, 2, 5);
insert into t (a, b, c) values(1, 3, 6);

select distinct on (a,b) a, b, c
from t;

From the docs: "SELECT DISTINCT eliminates duplicate rows from the result. SELECT DISTINCT ON eliminates rows that match on all the specified expressions."

Upvotes: 0

This should do the trick

CREATE TABLE tblB AS (
    SELECT A, B, max(C) AS max_of_C FROM tblA GROUP BY A, B
)

Upvotes: 4

JL Peyret
JL Peyret

Reputation: 12214

Use a view to do the distinct and then join it to the original table to pick one row of column C. Inserting into the target is left for you to figure out. Oh, and you could pick up multiple columns from t, not just c - the only thing is that your subquery needs to find a way to limit it to only one row.

create table t (a int, b int, c int);

create view tv as select distinct a, b from t;

insert into t (a, b, c) values(1, 2, 10);
insert into t (a, b, c) values(1, 2, 20);
insert into t (a, b, c) values(1, 3, 30);
insert into t (a, b, c) values(1, 3, 40);

CREATE TABLE tblB AS (
select tv.a, tv.b, t.c  from tv, t
where tv.a = t.a and tv.b = t.b
/* pick smallest ctid which is a unique row id built into postgres */
and t.ctid = (select min(ctid) from t s where s.a = t.a and s.b = t.b);
)

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1271141

When you use DISTINCT ON you should have ORDER BY:

SELECT DISTINCT ON (A,B), C
INTO tblB
FROM tblA
ORDER BY A, B;

Upvotes: 7

Related Questions