Jean-philippe Emond
Jean-philippe Emond

Reputation: 1614

TSQL group by generate duplicate row

I'm trying to extract all prices and taxes by dates range (not necessary the same date) in 2 column and group by ID. Because I need to group by 2 others columns because T-SQL need that:

Column '...' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

I have a duplicate user/ID sometimes. ( don't know why by the way..)

I have this SQL:

WITH myQuery AS 
(
  Select
      c.name, c.id,
      CASE 
      WHEN g.dateCreated BETWEEN CAST ('2016-06-01 00:00:00.000' AS DATETIME) 
           AND CAST ('2017-05-31 23:59:59.000' AS DATETIME) 
      THEN SUM(CAST(g.price AS decimal(20,2) ))
      ELSE 0
      END AS TOTAL_PRICE,
      CASE 
      WHEN g.dateCreated BETWEEN CAST ('2016-01-01 00:00:00.000' AS DATETIME) 
           AND CAST ('2016-12-31 23:59:59.000' AS DATETIME) 
      THEN SUM(CAST(g.tax AS decimal(20,2) ))
      ELSE 0
      END AS TOTAL_TAX
  FROM customers c 
  inner join goodies g
      ON c.id = g.customer_id
  GROUP BY  c.name, c.id, g.dateCreated
)
SELECT count(*) FROM  myQuery

I got 5203 rows. I have only 5031 users.

When I Analyse my data, I have some duplicate data.

Example:

  Alex, 12,   0.00,  0.00
  Alex, 12, 100.00, 14.55
 Nancy,  4,   0.00,  0.00
Arthur, 97,  48.14, 09.17

I tried to group by only by id but it seem that I can't do that.

Why I have a duplicate data and How to prevent that and ensure that I have 1 row by USER even if they don't buy goodies?

Upvotes: 1

Views: 130

Answers (1)

SqlZim
SqlZim

Reputation: 38023

Correcting your conditional aggregation and removing dateCreated from the group by:

with myQuery as (
select
    c.name
  , c.id
  , total_price = sum(case
      when g.dateCreated >= '20160601' and g.dateCreated < '20170601'
        then cast(g.price as decimal(20,2))
      else 0
      end)
  , total_tax = sum(case
      when g.dateCreated >= '20160101' and g.dateCreated < '20170101'
        then cast(g.tax as decimal(20,2))
      else 0
      end)
from customers c
  left join goodies g
    on c.id = g.customer_id
group by
    c.name
  , c.id
--, g.dateCreated
)
select count(*) from  myQuery;

Changing the inner join to a left join will return customers even if they have no corresponding row in goodies.

I also changed your date range code to be more explicit about what is included.

Reference:

Upvotes: 2

Related Questions