erniereg
erniereg

Reputation: 41

PostgreSQL - return most common value for all columns in a table

I've got a table with a lot of columns in it and I want to run a query to find the most common value in each column.

Ordinarily for a single column, I'd run something like:

SELECT country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1

Does PostgreSQL have a built in function for doing this or can anyone suggest a query I could run to achieve this?

Upvotes: 4

Views: 5265

Answers (4)

alexkovelsky
alexkovelsky

Reputation: 4190

Starting from PG 9.4 there is aggregate function for this:

mode() WITHIN GROUP (ORDER BY sort_expression)

returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)

And for earlier versions, you could create one...

  CREATE OR REPLACE FUNCTION mode_array(anyarray)
            RETURNS anyelement AS
$BODY$
    SELECT a FROM unnest($1) a GROUP BY 1 ORDER BY COUNT(1) DESC, 1 LIMIT 1;
$BODY$
LANGUAGE SQL IMMUTABLE;



CREATE AGGREGATE mode(anyelement)(
   SFUNC = array_append, --Function to call for each row. Just builds the array
   STYPE = anyarray,
   FINALFUNC = mode_array, --Function to call after everything has been added to array
   INITCOND = '{}'--Initialize an empty array when starting
) ;

Usage: SELECT mode(column) FROM table;

Upvotes: 2

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125214

This window function version will read the users table and the computed table once each. The correlated subquery version will read the users table once for each of the columns. If the columns are many as in the OPs case then my guess is that this is faster. SQL Fiddle

select distinct on (country_count, age_count) *
from (
    select
        country,
        count(*) over(partition by country) as country_count,
        age,
        count(*) over(partition by age) as age_count
    from users
) s
order by country_count desc, age_count desc
limit 1

Upvotes: 1

Filipe Roxo
Filipe Roxo

Reputation: 722

Using the same query, for more than one column you should do:

SELECT *
FROM
(
    SELECT country
    FROM users
    GROUP BY 1
    ORDER BY count(*) DESC
    LIMIT 1
) country
,(
    SELECT city
    FROM users
    GROUP BY 1
    ORDER BY count(*) DESC
    LIMIT 1
) city

This works for any type and will return all the values in the same row, with the columns having its original name.

For more columns just had more subquerys as:

,(
    SELECT someOtherColumn
    FROM users
    GROUP BY 1
    ORDER BY count(*) DESC
    LIMIT 1
) someOtherColumn

Edit:

You could reach it with window functions also. However it will not be better in performance nor in readability.

Upvotes: 4

Kevin
Kevin

Reputation: 30151

If I were doing this, I'd write a query like this one:

SELECT 'country', country
FROM users
GROUP BY country
ORDER BY count(*) DESC
LIMIT 1
UNION ALL
SELECT 'city', city
FROM USERS
GROUP BY city
ORDER BY count(*) DESC
LIMIT 1
-- etc.

It should be noted this only works if all the columns are of compatible types. If they are not, you'll probably need a different solution.

Upvotes: 1

Related Questions