Reputation: 40346
I have a table like this (primary key omitted for clarity):
events:
itemId eventType
-----------------
100 1
101 1
101 2
102 2
102 2
There are other event types but I only care about 1's and 2's. I want to find the count of these eventType
s per itemId
, but I also need a way to conveniently perform math on the results. For example, I want an output like this:
itemId ones twos onesPct twosPct
-------------------------------------
100 1 0 1.0 0.0
101 1 1 0.5 0.5
102 0 2 0.0 1.0
In my actual application, the math I'm performing is much more complex than percentages. The dialect is T-SQL. So right now I have a query like this; I'm not that great with SQL and so the best I came up with was:
SELECT
COALESCE(onest.itemId,twost.itemId) itemId,
COALESCE(onest.n,0) ones,
COALESCE(twost.n,0) twos,
1.0*COALESCE(onest.n,0) / (COALESCE(onest.n,0) + COALESCE(twost.n,0)) onesPct,
1.0*COALESCE(twost.n,0) / (COALESCE(onest.n,0) + COALESCE(twost.n,0)) twosPct
FROM
(SELECT itemId, COUNT(*) n
FROM events
WHERE eventType = 1
GROUP BY itemId) onest
FULL OUTER JOIN
(SELECT itemId, COUNT(*) n
FROM events
WHERE eventType = 2
GROUP BY itemId) twost
ON onest.itemId = twost.itemId
This is working except all of the COALESCE
s in the equations are becoming extremely unwieldy. So I have two questions:
Most importantly: Is there some way of converting NULLs from the outer join to 0 at a more global level, so that I don't have to keep writing e.g. COALESCE(onest.n,0)
everywhere? From what I've seen so far, I'm not allowed to use column aliases in other column specifications, so I can't simply do e.g. (1.0 * ones / (ones + twos)) onestPct
in the above query (unless there is a way to do this)?
Is there a shorter query that can accomplish this whole thing? This seems like a mess of subqueries.
I don't care too much about performance, the main task is data mining and analysis.
Also, apologies for the poor title, I couldn't come up with a good short summary.
Upvotes: 1
Views: 81
Reputation: 2991
I know you said what you are doing is more complicated than percentages, but I think regardless the key is simply a subquery.
Do your grouping work first. When that's done, use that as the source of a query in which you do the actual math. And of course, if it's too complex you could consider putting that subquery into a temporary table.
So here's what I would do if it were just the data you provided.
select itemid,
ones,
twos,
ones + twos as total,
100 * (round(ones / (ones + twos), 4)) as ones_perc,
100 * (round(twos / (ones + twos), 4)) as twos_perc
from
(select itemid,
sum(case when eventtype = 1 then 1 else 0 end) as ones,
sum(case when eventtype = 2 then 1 else 0 end) as twos
from a
group by itemid)b
Upvotes: 3
Reputation: 16906
select itemId,ones,twos,ones/c onesPct,twos/c twosPct from (
select itemId,sum(2-eventType) ones,sum(eventType-1) twos,sum(1.0) c from events where eventType in (1,2) group by itemId
) e
Upvotes: 1
Reputation: 43023
You can use aggregate functions (SUM()
) on CASE
which simplifies the whole query:
select itemId,
sum(case eventType when 1 then 1 else 0 end) ones,
sum(case eventType when 2 then 1 else 0 end) twos,
convert(float, sum(case eventType when 1 then 1 else 0 end)) / count(eventType) onesPct,
convert(float, sum(case eventType when 2 then 1 else 0 end)) / count(eventType) twosPct
from events
group by itemId
If there are items which don't have any events, you should either exclude them or handle percentage calculation separately as the query above will cause division by zero.
Upvotes: 1