vlad_tepesch
vlad_tepesch

Reputation: 6925

sql query to get for each item the line of a group that has the maximum occurence

the question is difficult to summarize in the title, so here a more verbose example:

I have a huge dataset of dozens of measurements for thousands of different objects. Most of them have an associated type but this type is not unambiguous.

So a Select like

SELECT oid, type, count(type) FROM data GROUP BY oid, type;

will produce something like:

oid   type  count(type)
 0      0   22
 1      0   22
 2      1   61
 2      2  104
 3      2   63
 4      0   34
 6      0    1
 8      2   76
 9      0    1
11      3   33
12      0   55
13      4    1
13      5   28
13      1    2
13      2  255
14      4  148
14      1    4
14      2    3
15      3   10
16      0   13
18      4  137
18      1    5

How can i get only one line per object to the result if this only line has to be the one with the most occurences?

Bonus-Question: also get a percentage per object line that represents the occurrence ratio of this type.

The result should look like:

oid   type  P(type)
 0      0   1.0
 1      0   1.0
 2      2   0.64
 3      2   1.0
 4      0   1.0
 6      0   1.0
 8      2   1.0
 9      0   1.0
11      3   1.0
12      0   1.0
13      2   0.89
14      4   0.95
15      3   1.0
16      0   1.0
18      4   0.96

edit:

some test data and the almost-correct output of one solution:

http://pastebin.com/jVvHErJ2

Upvotes: 0

Views: 77

Answers (2)

Kirill Kin
Kirill Kin

Reputation: 391

This query solves both your problems

SELECT s.oid,
       s.type,
       s.total_per_oid_per_type,
       (s.total_per_oid_per_type + 0.0) / s.total_per_oid AS percentage
FROM (SELECT v.oid,
             v.type,
             v.total_per_oid_per_type,
             ROW_NUMBER() OVER (PARTITION BY v.oid ORDER BY v.total_per_oid_per_type DESC) AS object_number,
             SUM(v.total_per_oid_per_type) OVER (PARTITION BY v.oid) AS total_per_oid
      FROM (SELECT t.oid, t.type, count(1) AS total_per_oid_per_type
            FROM data t
            GROUP BY t.oid, t.type) v ) s
WHERE object_number = 1

Solution special for Sqlite3 (equals to above)

WITH v AS (
    SELECT oid,
           type,
           COUNT(1) AS total_per_oid_per_type
    FROM data
    GROUP BY oid, type
),
s AS (
    SELECT oid,
           MAX(total_per_oid_per_type) AS max_total_per_oid
    FROM v
    GROUP BY oid
),
totals AS (
    SELECT oid,
           SUM(total_per_oid_per_type) AS total_per_oid
    FROM v
    GROUP BY oid
)
SELECT v.oid,
       v.type,
       v.total_per_oid_per_type,
       (v.total_per_oid_per_type + 0.0) / totals.total_per_oid AS percentage
FROM v
     INNER JOIN s ON v.oid = s.oid AND v.total_per_oid_per_type = s.max_total_per_oid
     INNER JOIN totals ON v.oid = totals.oid
ORDER BY v.oid, v.type

Upvotes: 1

JonWay
JonWay

Reputation: 1735

Try this it should work

create table ##TBL (oid INT, [type] INT, [count(type)] INT)
INSERT INTO ##TBL VALUES
(0,0,22),
(1,0,22),
(2,1,61),
(2,2,104),
(3,2,63),
(4,0,34),
(6,0,1),
(8,2,76),
(9,0,1),
(11,3,33),
(12,0,55),
(13,4,1),
(13,5,28),
(13,1,2),
(13,2,255),
(14,4,148),
(14,1,4),
(14,2,3),
(15,3,10),
(16,0,13),
(18,4,137),
(18,1,5)
--------------------------------

SELECT oid
      ,max([type]) as x
      --,Max([count(type)]) AS [count(type)]
      ,CAST( CAST( MAX([count(type)]) AS DECIMAL(10,2) ) /  CAST( SUM([count(type)]) AS DECIMAL(10,2) ) AS DECIMAL(10,2) ) AS 'Percent %'
from ##TBL
group by oid

Upvotes: 0

Related Questions