Jake Wagner
Jake Wagner

Reputation: 826

Group by Returns Multiple Rows

My query below returns multiple lines instead of aggregating the count. What am I doing wrong.

select case when s.Customer is not null then 'Paid' else 'Free' END as Paid_Free,
       Status,case when CompleteDate < '4/1/2018' then '1H' else '2H' end as Busy, 
       count(*) userCount
from CompletesCur OS 
   left outer join  Sales s on S.Cust = ID
WHERE s.ReturnYear = 2017
  and s.MediaType = 'Online'
  and os.ReturnYearFiled = 2017
  and s.ReturnYear = s.TransactionTaxYear2
  and s.ProductGroup in ('Fed','State','Phone Support', 'Import')
GROUP by s.Customer,Status,CompleteDate

Results:

Paid_Free   Status          Busy   Usercount
Paid      Returning Yr3+     2H      2
Paid      Returning Yr3+     2H      1
Paid      Returning Yr3+     1H      1
Paid      Returning Yr3+     1H      1
Paid      Returning Yr2+     2H      2
Paid      Returning Yr2+     2H      2

Results Requested:

Paid_Free    Status            Busy    UserCount
Paid        Returning Yr3+      2H       3
Paid        Returning Yr3+      1H       2
Paid        Returning Yr2+      2H       4  

Upvotes: 0

Views: 958

Answers (5)

Caius Jard
Caius Jard

Reputation: 74605

What am i doing wrong?

While the other answers explain what to do to fix the problem, you specifically asked what you did wrong and I hope this answer will help you understand that..

Make this change to your select, and you'll see:

select 
   case when s.Customer is not null then 'Paid' else 'Free' END as Paid_Free,
   Status,
   case when CompleteDate < '4/1/2018' then '1H' else '2H' end as Busy, 
   --my additions
   s.customer,
   CompleteDate,
   count(*) userCount

No more duplicate rows! (I didn't squish the duplicates, I added the columns that were actually grouped, so you'll still get all the unexpected rows, but it will be easy to see why they appeared.)

The combination of customer/status/completedate is unique but you transform the data after it has been grouped. The result of transforming this data is that it becomes less precise - there are thousands of dates before 4/1/2018 but you effectively converted it to a boolean value (is-before or is-after). All your rows that were dates of 3/29/2018, 3/28/2018 etc.. and were formerly unique, now all became the same value, hence the apparent duplicates

Hiding info/losing precision in this way, creates duplicates

Upvotes: 0

theo_vvv
theo_vvv

Reputation: 43

You should

group by 
  case when s.Customer is not null then 'Paid' else 'Free' END,
  Status,
  case when CompleteDate < '4/1/2018' then '1H' else '2H' end

Otherwise you are still creating groups made of distinct customers and dates.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269803

You need to group by the non-aggregated expressions in the select (and generally only those). Repeating complex expressions can get cumbersome, which is why I like defining new fields in subqueries or using `apply:

select v.Paid_Free, Status, v.Busy, 
       count(*) as userCount
from CompletesCur OS left join
     Sales s
     on S.Cust = ID cross apply
     (values ( case when s.Customer is not null then 'Paid' else 'Free' END,
               case when CompleteDate < '2018-04-01' then '1H' else '2H' end
             )
     ) v(Paid_Free, Busy)
WHERE s.ReturnYear = '2017' and  -- is this really a string?
      MediaType = 'Online' and   -- what table is this from?
      os.ReturnYearFiled = 2017 and
      s.ReturnYear = s.TransactionTaxYear2 and
      s.ProductGroup in ('Fed', 'State', 'Phone Support', 'Import')
GROUP by v.Paid_Free, Status, v.Busy;

Other notes:

  • You should qualify all column names, including Status and MediaType.
  • Is ReturnYear a number or a string? If a number, do not use quotes for the comparison value.
  • I changed the date constant to a standard YYYY-MM-DD format.

Upvotes: 1

JamieD77
JamieD77

Reputation: 13949

Group by what you're expecting in your output

select          case
                    when s.Customer is not null then 'Paid'
                else 'Free'
                end        as Paid_Free
                , Status
                , case
                      when CompleteDate < '4/1/2018' then '1H'
                  else '2H'
                  end      as Busy
                , count(*) userCount
from            CompletesCur OS
left outer join Sales        s on S.Cust = ID
where           s.ReturnYear = '2017'
and             MediaType = 'Online'
and             os.ReturnYearFiled = 2017
and             s.ReturnYear = s.TransactionTaxYear2
and             s.ProductGroup in ('Fed', 'State', 'Phone Support', 'Import')
group by        case
                    when s.Customer is not null then 'Paid'
                else 'Free'
                end
                , Status
                , case
                      when CompleteDate < '4/1/2018' then '1H'
                  else '2H'
                  end

Upvotes: 1

Zaynul Abadin Tuhin
Zaynul Abadin Tuhin

Reputation: 31993

you can try like below

with cte
(
select case when s.Customer is not null then 'Paid' else 'Free' END as Paid_Free,
Status,case when CompleteDate < '4/1/2018' then '1H' else '2H' end as Busy, 
count(*) userCount
     from CompletesCur OS 
   left outer join  Sales s on S.Cust = ID
                  WHERE s.ReturnYear = '2017'
                  and MediaType = 'Online'
                  and os.ReturnYearFiled = 2017
                  and s.ReturnYear = s.TransactionTaxYear2
                  and s.ProductGroup in ('Fed','State','Phone Support', 'Import')

                  GROUP by s.Customer,Status,CompleteDate
) select Paid_Free,Status,Busy, sum(userCount) from cte
  group by Paid_Free,Busy,Status

Upvotes: 1

Related Questions