dsr301
dsr301

Reputation: 779

Find most occurring value in a group

I want to find the most occurring value per group.

I tried using top(k)(column) but I get below error: Column class is not under aggregate function and not in GROUP BY.

For example: If I have table test_date with columns(pid, value)

pid, value
----------
1,a
1,b
1,a
1,c

I want result :

pid, value
----------
1,a

I tried SELECT pid,top(1)(value) top_value FROM test_data group by pid

I get the error: 

Column value  is not under aggregate function and not in GROUP BY

I also tried with anyHeavy() but it only works for values that occur more than in half the cases

Upvotes: 3

Views: 2267

Answers (2)

dsr301
dsr301

Reputation: 779

SELECT pid,topK(1)(value) top_value FROM test_data group by pid

Upvotes: 3

vladimir
vladimir

Reputation: 15226

This query should help you:

    SELECT
        pid,
        /*
        Decompose the query in parts:
        1. groupArray((value, count)): convert the group of rows with the same 'pid' to the array of tuples (value, count)
        2. arrayReverseSort: make reverse sorting by 'count' ('x.2' is 'count')
        3. [1].1: take the 'value' from the first item of the sorted array
        */
        arrayReverseSort(x -> x.2, groupArray((value, count)))[1].1 AS value
    FROM
    (
        SELECT
            pid,
            value,
            count() AS count
        FROM test_date
        GROUP BY
            pid,
            value
    )
    GROUP BY pid
    ORDER BY pid ASC

Upvotes: 6

Related Questions