SQLnoob
SQLnoob

Reputation: 77

distinct count of items using case statements

I'm having trouble getting the count of customers that bought Item A and B . Below is a sample of the data I'm using.

Customer No | Item
___________________
 1            A
 1            B
 2            B
 3            A
 4            A
 4            B
 5            B
 6            A

I'm trying to get a count of how many customers bought Item A and B. This is what i tried so far. and I'm getting the following results Item A = 5, Item B = 3 and Item A and B = 6.

Select 
count (distinct case when ItemNo = 'A' then customerNo end) as [A],
count (distinct case when ItemNo = 'B' then customerNo end) as [B],
count (distinct case when (ItemNo = 'A' or ItemNo = 'B') then customerNo end) as [AandB]
from Items

This is the result I'm trying to get:

Item | Count
A             4
B             4
A and B       2

Can someone please point me to the right direction to get this result. Thanks!

Upvotes: 2

Views: 401

Answers (5)

Jeffrey McCullen
Jeffrey McCullen

Reputation: 61

select count (distinct case when I.ItemNo = 'A' then customerNo end) as A, 
       count (distinct case when I.ItemNo = 'B' then customerNo end) as B, 
       count (distinct case when (I.ItemNo = 'A' and (select count(*) from items as I2 where I2.itemno = 'B' and I.customerNo = I2.customerNo ) > 0) 
       then customerNo end) as AB 
from Items as I;

Instead of using an 'OR' condition in the last COUNT, you want an 'AND'. Here, I'm using a correlated subquery to count the number of rows where the same customer (I.customerNo = I2.customerNo) ordered itemno 'B' (I2.itemno = 'B'). If the count is > 0, then the customer ordered item A and B.

The correlated subquery is established by the I.customerNo = I2.customerNo condition.

This is a longer way that works also if you cannot do a subquery in the COUNT

SELECT SUM(D.A), SUM(D.B), SUM(D.AB) FROM
(
    select A, B, case when U.A > 0 AND U.B > 0 THEN 1 end AS AB
    from (
        select count (distinct case when I.ItemNo = 'A' then customerNo end) as A, 
               count (distinct case when I.ItemNo = 'B' then customerNo end) as B
        from Items as I
        group by I.customerNo
    ) AS U
) AS D;

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270091

I would do this with two levels of aggregation:

select sum(has_a) as num_A,
       sum(has_b) as num_B,
       sum(has_a * has_b) as num_AB
from (select i.customer,
             max(case when item_no = 'A' then 1 else 0 end) as has_A,
             max(case when item_no = 'B' then 1 else 0 end) as has_B
      from items
      group by i.customer
     ) ic;

You can also get this information on different rows:

select has_a, has_b, count(*)
from (select i.customer,
             max(case when item_no = 'A' then 1 else 0 end) as has_A,
             max(case when item_no = 'B' then 1 else 0 end) as has_B
      from items
      group by i.customer
     ) ic
group by has_A, has_B;

This is not exactly the same, because the values are for only the combination of products. In other words, each customer is counted only once. What I like is:

  • It is easy to extend to more products and combinations.
  • You can change the subquery and have counts instead of flags.
  • Customers are only counted only once.

Upvotes: 0

Vladimir Baranov
Vladimir Baranov

Reputation: 32695

Here is one more way to do it.

SQL Server has a useful operator INTERSECT, which is exactly what you need here - an intersection of two sets (those who bought A and those who bought B).

I think it is more readable than obscure grouping with a HAVING filter.

Performance-wise, you should try all variants on your real data.

Sample data

DECLARE @T TABLE (CustomerNo int, Item varchar(50));
INSERT INTO @T (CustomerNo, Item) VALUES
(1, 'A'),
(1, 'B'),
(2, 'B'),
(3, 'A'),
(4, 'A'),
(4, 'B'),
(5, 'B'),
(6, 'A');

Query

SELECT
    Item
    ,COUNT(DISTINCT CustomerNo) AS CustomerCount
    ,0 AS SortOrder
FROM @T
GROUP BY Item

UNION ALL

SELECT
    'A & B' AS Item
    ,COUNT(*) AS CustomerCount
    ,1 AS SortOrder
FROM
(
    SELECT CustomerNo
    FROM @T
    WHERE Item = 'A'

    INTERSECT

    SELECT CustomerNo
    FROM @T
    WHERE Item = 'B'
) AS T

ORDER BY SortOrder, Item
;

First part of the query counts customer with simple grouping of them by the Item. The second part of the query (after UNION ALL) counts those customers that bought both A and B.

SortOrder column is just to order the final result appropriately.

Result

+-------+---------------+-----------+
| Item  | CustomerCount | SortOrder |
+-------+---------------+-----------+
| A     |             4 |         0 |
| B     |             4 |         0 |
| A & B |             2 |         1 |
+-------+---------------+-----------+

Upvotes: 2

D-Shih
D-Shih

Reputation: 46229

I think your expected result might be A = 4 and B = 4.

You can try to use UNION ALL to get A,B and A & B amount.

SELECT ItemNo,COUNT(distinct customerNo) Count
FROM Items
GROUP BY ItemNo
UNION ALL
SELECT 'A and B',count(*)
FROM (
    SELECT COUNT(DISTINCT ItemNo) cnt
    FROM Items tt
    WHERE ItemNo IN ('A','B') 
    GROUP BY [CustomerNo]
    HAVING COUNT(DISTINCT ItemNo) = 2
) t1

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521674

Here is one approach. We can try first aggregating by customer to generate the A and B counts. Then, take aggregates of those counts.

WITH cte AS (
    SELECT customerNo,
        COUNT(CASE WHEN Item = 'A' THEN 1 END) AS a_cnt,
        COUNT(CASE WHEN Item = 'B' THEN 1 END) AS b_cnt
    FROM Items
    GROUP BY customerNo
)

SELECT 'A' AS Item, COUNT(CASE WHEN a_cnt > 0 THEN 1 END) AS Count, 0 AS pos FROM cte
UNION ALL
SELECT 'B', COUNT(CASE WHEN b_cnt > 0 THEN 1 END), 1 FROM cte
UNION ALL
SELECT 'A and B', COUNT(CASE WHEN a_cnt > 0 AND b_cnt > 0 THEN 1 END), 2 FROM cte
ORDER BY pos;

Upvotes: 0

Related Questions