tonyibm
tonyibm

Reputation: 641

How to get the count of duplicate value for all columns of a table

I have a table like this,

dept_no | employee_id 
1       | 001
1       | 002
2       | 003
2       | 004

I want to get values like this:

field_name | count_of_distinct_value
dept_no    | 2
employee_id| 4

I know how to get the count of distinct value for a certain field, but don't know how to get for all columns at a time. How can I do that?

Upvotes: 0

Views: 402

Answers (2)

Abelisto
Abelisto

Reputation: 15614

select key, count(distinct value)
from (select (jsonb_each(to_jsonb(t.*))).* from pg_class as t) as tt
group by key;

It is definitely not the most efficient solution but it is applicable for any table. Just replace pg_class by the desired table name.

PS: I got a lot of pain proposing this solution. Imagine the table with 100M rows and 100 columns. Then PostgreSQL should to build and sort the intermediate data with 10000000000 rows.

If you don't want the exact numbers but only evaluative then look at the pg_stats table at the n_distinct column in particular.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269773

Probably the simplest method is union all:

select 'dept_no' as colname, count(distinct dept_no)
from t
union all
select 'employee_id' as colname, count(distinct employee_id)
from t;

Unpivoting and aggregating is another possible solution. But it probably has slightly worse performance and requires converting the columns to a string (a common type):

select colname, count(distinct val)
from t cross join lateral
    (values ('dept_no', dept_no::text), ('employee_id', employee_id)
     ) v(colname, val)
group by colname;

Or better, in one pass:

select
    colname,
    count_of_distinct_value
from (
    select
        count(distinct dept_no) as dept_no,
        count(distinct employee_id) as employee_id
    from t) ....

Upvotes: 1

Related Questions