Jason C
Jason C

Reputation: 40346

Counting occurrence while changing NULL to 0

I have a table like this (primary key omitted for clarity):

events:

itemId  eventType
-----------------
  100      1
  101      1
  101      2
  102      2
  102      2

There are other event types but I only care about 1's and 2's. I want to find the count of these eventTypes per itemId, but I also need a way to conveniently perform math on the results. For example, I want an output like this:

itemId  ones  twos   onesPct  twosPct
-------------------------------------
 100      1    0       1.0      0.0
 101      1    1       0.5      0.5
 102      0    2       0.0      1.0

In my actual application, the math I'm performing is much more complex than percentages. The dialect is T-SQL. So right now I have a query like this; I'm not that great with SQL and so the best I came up with was:

SELECT
  COALESCE(onest.itemId,twost.itemId) itemId,
  COALESCE(onest.n,0) ones,
  COALESCE(twost.n,0) twos,
  1.0*COALESCE(onest.n,0) / (COALESCE(onest.n,0) + COALESCE(twost.n,0)) onesPct,
  1.0*COALESCE(twost.n,0) / (COALESCE(onest.n,0) + COALESCE(twost.n,0)) twosPct
FROM
   (SELECT itemId, COUNT(*) n
    FROM events
    WHERE eventType = 1
    GROUP BY itemId) onest
FULL OUTER JOIN
   (SELECT itemId, COUNT(*) n
    FROM events
    WHERE eventType = 2 
    GROUP BY itemId) twost
ON onest.itemId = twost.itemId

This is working except all of the COALESCEs in the equations are becoming extremely unwieldy. So I have two questions:

  1. Most importantly: Is there some way of converting NULLs from the outer join to 0 at a more global level, so that I don't have to keep writing e.g. COALESCE(onest.n,0) everywhere? From what I've seen so far, I'm not allowed to use column aliases in other column specifications, so I can't simply do e.g. (1.0 * ones / (ones + twos)) onestPct in the above query (unless there is a way to do this)?

  2. Is there a shorter query that can accomplish this whole thing? This seems like a mess of subqueries.

I don't care too much about performance, the main task is data mining and analysis.

Here it is on SQLFiddle.

Also, apologies for the poor title, I couldn't come up with a good short summary.

Upvotes: 1

Views: 81

Answers (3)

user158017
user158017

Reputation: 2991

I know you said what you are doing is more complicated than percentages, but I think regardless the key is simply a subquery.

Do your grouping work first. When that's done, use that as the source of a query in which you do the actual math. And of course, if it's too complex you could consider putting that subquery into a temporary table.

So here's what I would do if it were just the data you provided.

select itemid,
       ones,
       twos,
       ones + twos as total,
       100 * (round(ones / (ones + twos), 4)) as ones_perc,
       100 * (round(twos / (ones + twos), 4)) as twos_perc
from 
    (select itemid,
       sum(case when eventtype = 1 then 1 else 0 end) as ones,
       sum(case when eventtype = 2 then 1 else 0 end) as twos
     from a
     group by itemid)b

Upvotes: 3

Hafthor
Hafthor

Reputation: 16906

select itemId,ones,twos,ones/c onesPct,twos/c twosPct from (
 select itemId,sum(2-eventType) ones,sum(eventType-1) twos,sum(1.0) c from events where eventType in (1,2) group by itemId
) e

Upvotes: 1

Szymon
Szymon

Reputation: 43023

You can use aggregate functions (SUM()) on CASE which simplifies the whole query:

select itemId,
  sum(case eventType when 1 then 1 else 0 end) ones,
  sum(case eventType when 2 then 1 else 0 end) twos,
  convert(float, sum(case eventType when 1 then 1 else 0 end)) / count(eventType) onesPct,
  convert(float, sum(case eventType when 2 then 1 else 0 end)) / count(eventType) twosPct
from events
group by itemId

SQL Fiddle Demo

If there are items which don't have any events, you should either exclude them or handle percentage calculation separately as the query above will cause division by zero.

Upvotes: 1

Related Questions