Insu Q
Insu Q

Reputation: 423

SQL group by and sum based on distinct value in other column (sum once if value in other column is duplicated)

I need help with a group-by query. My table looks like this:

CREATE MULTISET TABLE MY_TABLE (PERSON CHAR(1), ITEM CHAR(1), COST INT);
INSERT INTO MY_TABLE VALUES ('A', '1', 5);
INSERT INTO MY_TABLE VALUES ('A', '1', 5);
INSERT INTO MY_TABLE VALUES ('A', '2', 1);
INSERT INTO MY_TABLE VALUES ('B', '3', 0);
INSERT INTO MY_TABLE VALUES ('B', '4', 10);
INSERT INTO MY_TABLE VALUES ('B', '4', 10);
INSERT INTO MY_TABLE VALUES ('C', '5', 1);
INSERT INTO MY_TABLE VALUES ('C', '5', 1);
INSERT INTO MY_TABLE VALUES ('C', '5', 1);
+--------+------+------+
| PERSON | ITEM | COST |
+--------+------+------+
| A      | 1    |    5 |
| A      | 1    |    5 |
| A      | 2    |    1 |
| B      | 3    |    0 |
| B      | 4    |   10 |
| B      | 4    |   10 |
| C      | 5    |    1 |
| C      | 5    |    1 |
| C      | 5    |    1 |
+--------+------+------+

I need to group items and costs by person, but in different ways. For each person, I need the number of unique items they have. Ex: Person A has two distinct items, item 1 and item 2. I can get this with COUNT(DISTINCT ITEM).

Then for each person, I need to sum the cost but only once per distinct item (for duplicate items, the cost is always the same). Ex: Person A has item 1 for $5, item 1 for $5, and item 2 for $1. Since this person has item 1 twice, I count the $5 once, and then add the $1 from item 2 for a total of $6. The output should look like this:

+--------+---------------------+------------------------+
| PERSON | ITEM_DISTINCT_COUNT | COST_DISTINCT_ITEM_SUM |
+--------+---------------------+------------------------+
| A      |                   2 |                      6 |
| B      |                   2 |                     10 |
| C      |                   1 |                      1 |
+--------+---------------------+------------------------+

Is there an easy way to do this that performs good on a lot of rows?

SELECT PERSON
  ,COUNT(DISTINCT ITEM) ITEM_DISTINCT_COUNT
  -- help with COST_DISTINCT_ITEM_SUM
FROM MY_TABLE
GROUP BY PERSON

Upvotes: 0

Views: 2190

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270371

I recommend a two levels of aggregation:

select person, count(*) as num_items, sum(cost)
from (select person, item, avg(cost) as cost
      from my_table t
      group by person, item
     ) t
group by person;

Upvotes: 0

Nick
Nick

Reputation: 147206

You can make a subquery which gets the distinct values of item and cost for each person, and then aggregate over that:

SELECT PERSON, 
       COUNT(ITEM) AS ITEM_DISTINCT_COUNT,
       SUM(COST) AS COST_DISTINCT_ITEM_SUM 
FROM (
  SELECT DISTINCT PERSON, ITEM, COST
  FROM MY_TABLE
) M
GROUP BY PERSON

Output:

PERSON  ITEM_DISTINCT_COUNT     COST_DISTINCT_ITEM_SUM
A       2                       6
B       2                       10
C       1                       1

Demo on dbfiddle

Upvotes: 3

Related Questions