Reputation: 6925
the question is difficult to summarize in the title, so here a more verbose example:
I have a huge dataset of dozens of measurements for thousands of different objects. Most of them have an associated type but this type is not unambiguous.
So a Select like
SELECT oid, type, count(type) FROM data GROUP BY oid, type;
will produce something like:
oid type count(type)
0 0 22
1 0 22
2 1 61
2 2 104
3 2 63
4 0 34
6 0 1
8 2 76
9 0 1
11 3 33
12 0 55
13 4 1
13 5 28
13 1 2
13 2 255
14 4 148
14 1 4
14 2 3
15 3 10
16 0 13
18 4 137
18 1 5
How can i get only one line per object to the result if this only line has to be the one with the most occurences?
Bonus-Question: also get a percentage per object line that represents the occurrence ratio of this type.
The result should look like:
oid type P(type)
0 0 1.0
1 0 1.0
2 2 0.64
3 2 1.0
4 0 1.0
6 0 1.0
8 2 1.0
9 0 1.0
11 3 1.0
12 0 1.0
13 2 0.89
14 4 0.95
15 3 1.0
16 0 1.0
18 4 0.96
edit:
some test data and the almost-correct output of one solution:
Upvotes: 0
Views: 77
Reputation: 391
This query solves both your problems
SELECT s.oid,
s.type,
s.total_per_oid_per_type,
(s.total_per_oid_per_type + 0.0) / s.total_per_oid AS percentage
FROM (SELECT v.oid,
v.type,
v.total_per_oid_per_type,
ROW_NUMBER() OVER (PARTITION BY v.oid ORDER BY v.total_per_oid_per_type DESC) AS object_number,
SUM(v.total_per_oid_per_type) OVER (PARTITION BY v.oid) AS total_per_oid
FROM (SELECT t.oid, t.type, count(1) AS total_per_oid_per_type
FROM data t
GROUP BY t.oid, t.type) v ) s
WHERE object_number = 1
Solution special for Sqlite3 (equals to above)
WITH v AS (
SELECT oid,
type,
COUNT(1) AS total_per_oid_per_type
FROM data
GROUP BY oid, type
),
s AS (
SELECT oid,
MAX(total_per_oid_per_type) AS max_total_per_oid
FROM v
GROUP BY oid
),
totals AS (
SELECT oid,
SUM(total_per_oid_per_type) AS total_per_oid
FROM v
GROUP BY oid
)
SELECT v.oid,
v.type,
v.total_per_oid_per_type,
(v.total_per_oid_per_type + 0.0) / totals.total_per_oid AS percentage
FROM v
INNER JOIN s ON v.oid = s.oid AND v.total_per_oid_per_type = s.max_total_per_oid
INNER JOIN totals ON v.oid = totals.oid
ORDER BY v.oid, v.type
Upvotes: 1
Reputation: 1735
Try this it should work
create table ##TBL (oid INT, [type] INT, [count(type)] INT)
INSERT INTO ##TBL VALUES
(0,0,22),
(1,0,22),
(2,1,61),
(2,2,104),
(3,2,63),
(4,0,34),
(6,0,1),
(8,2,76),
(9,0,1),
(11,3,33),
(12,0,55),
(13,4,1),
(13,5,28),
(13,1,2),
(13,2,255),
(14,4,148),
(14,1,4),
(14,2,3),
(15,3,10),
(16,0,13),
(18,4,137),
(18,1,5)
--------------------------------
SELECT oid
,max([type]) as x
--,Max([count(type)]) AS [count(type)]
,CAST( CAST( MAX([count(type)]) AS DECIMAL(10,2) ) / CAST( SUM([count(type)]) AS DECIMAL(10,2) ) AS DECIMAL(10,2) ) AS 'Percent %'
from ##TBL
group by oid
Upvotes: 0