Reputation: 39
I need help with manipulating a table in PostgreSQL.
I have a table with three columns and more than 30k rows. Many of the values repeat themselves in the lines.
I must write a SELECT
statement that outputs the most frequent values of the first column, the most frequent values of the second column and the most common combination of both columns. The whole thing must be grouped by the third column.
What I've tried:
SELECT *
FROM
(SELECT column1 AS " Most frequent1", COUNT(column1) AS "occurrence"
FROM table_name
GROUP BY column1
ORDER BY occurrence DESC
LIMIT 1),
(SELECT column2 AS "Most frequent2", COUNT(column2) AS "occurrence"
FROM table_name
GROUP BY column2
ORDER BY occurrence DESC
LIMIT 1),
(SELECT CONCAT(column1, column2) AS "kombiniert", COUNT(CONCAT(column1, column2)) AS "occurrence"
FROM table_name
GROUP BY kombiniert
ORDER BY occurrence DESC
LIMIT 1);
How can I group the whole thing by column3?
Is there a better method?
Upvotes: 0
Views: 1106
Reputation: 81
Actually, if you want to get the mode of two columns you can use the sum operator. Something like this:
MODE() WITHIN GROUP (ORDER BY (col1 + col2)) as mode_col_1_and_2
EDITED: After reading the comments and a lot of testing, I regret of this answer. @a_horse_with_no_name came with the right way:
MODE() WITHIN GROUP (ORDER BY (col1, col2)) as mode_col_1_and_2
Upvotes: 0
Reputation: 94874
What you are looking for is MODE
, the statistical mode = the most frequent value.
SELECT
column3,
MODE() WITHIN GROUP (ORDER BY column1) AS most_frequent_column1,
MODE() WITHIN GROUP (ORDER BY column2) AS most_frequent_column2,
MODE() WITHIN GROUP (ORDER BY column1 || column2) AS most_frequent_pair
FROM table_name
GROUP BY column3
ORDER BY column3;
Two points:
MODE
ignores nulls. If this is undesired, you must workaround this.For the latter, please see a_horse_with_no_name's comments below. MODE() WITHIN GROUP (ORDER BY (column1, column2))
perfectly solves the ambiguous pairs issue.
Upvotes: 0
Reputation: 1269523
The most common value is called the mode in statistics. This should return the three modes that you are looking for:
select col3,
max(col1_cnt), max(case when seqnum_1 then col1 end),
max(col2_cnt), max(case when seqnum_2 then col2 end),
max(col12_cnt), max(case when seqnum_12 then col1 || ':' || col2 end)
from (select t.*,
row_number() over (partition by col3 order by col1_cnt desc) as seqnum_1,
row_number() over (partition by col3 order by col2_cnt desc) as seqnum_2,
row_number() over (partition by col3 order by col12_cnt desc) as seqnum_12
from (select col1, col2, col3, count(*) as cnt,
sum(count(*)) over (partition by col1) as col1_cnt,
sum(count(*)) over (partition by col2) as col2_cnt,
sum(count(*)) over (partition by col1, col2) as col1_col2_cnt
from t
group by col1, col2, col3
) t
) t
group by col3;
Upvotes: 2