Larry K
Larry K

Reputation: 49114

SQL for bucketing counts

I'm trying to create a histogram of people's reputations who ask questions on the site using StackExchange data explorer.

The following gives error message:

Each GROUP BY expression must contain at least one column that 
is not an outer reference.
Invalid column name 'lt_100'. ...

Suggestions appreciated

select
  case when Reputation < 100    then "lt_100"
       when Reputation >= 100 and Reputation < 200   then "100_199"
       when Reputation >= 200 and Reputation < 300   then "200_299"
       when Reputation >= 300 and Reputation < 400   then "300_399"
       when Reputation >= 400 and Reputation < 500   then "400_499"
       when Reputation >= 500 and Reputation < 600   then "500_599"
       when Reputation >= 600 and Reputation < 700   then "600_699"
       when Reputation >= 700 and Reputation < 800   then "700_799"
       when Reputation >= 800 and Reputation < 900   then "800_899"
       when Reputation >= 900 and Reputation < 1000  then "900_999"
       else "over 1000"
  end  ReputationRange,
  count(*) as TotalWithinRange
FROM Users
JOIN Posts ON Users.Id = Posts.OwnerUserId 
JOIN PostTags ON PostTags.PostId = Posts.Id
JOIN Tags on Tags.Id = PostTags.TagId
WHERE PostTypeId = 1 and Posts.CreationDate > '9/1/2010'
Group by 
1

Upvotes: 0

Views: 58

Answers (2)

Brian
Brian

Reputation: 13571

Unfortunatley you can't alias the group by using '1' like you can in an order by. - To avoid repeating the case statement in your group by you can leverage the 'with' clause in SQL:

with data as (
select
  case when Reputation < 100    then 'lt_100'
       when Reputation >= 100 and Reputation < 200   then '100_199'
       when Reputation >= 200 and Reputation < 300   then '200_299'
       when Reputation >= 300 and Reputation < 400   then '300_399'
       when Reputation >= 400 and Reputation < 500   then '400_499'
       when Reputation >= 500 and Reputation < 600   then '500_599'
       when Reputation >= 600 and Reputation < 700   then '600_699'
       when Reputation >= 700 and Reputation < 800   then '700_799'
       when Reputation >= 800 and Reputation < 900   then '800_899'
       when Reputation >= 900 and Reputation < 1000  then '900_999'
       else 'over 1000'
       end as ReputationRange FROM Users
JOIN Posts ON Users.Id = Posts.OwnerUserId 
JOIN PostTags ON PostTags.PostId = Posts.Id
JOIN Tags on Tags.Id = PostTags.TagId
WHERE PostTypeId = 1 and Posts.CreationDate > '9/1/2010')
select  ReputationRange, count(*) as TotalWithinRange
from data
Group by ReputationRange

Working Demo/Example

Upvotes: 1

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

You should use single-quotes for the ranges being classified. If you use " " it would be treated as a column name. Also you should include the calculation in the group by clause.

Demo

select
  case when Reputation < 100    then 'lt_100'
       when Reputation >= 100 and Reputation < 200   then '100_199'
       when Reputation >= 200 and Reputation < 300   then '200_299'
       when Reputation >= 300 and Reputation < 400   then '300_399'
       when Reputation >= 400 and Reputation < 500   then '400_499'
       when Reputation >= 500 and Reputation < 600   then '500_599'
       when Reputation >= 600 and Reputation < 700   then '600_699'
       when Reputation >= 700 and Reputation < 800   then '700_799'
       when Reputation >= 800 and Reputation < 900   then '800_899'
       when Reputation >= 900 and Reputation < 1000  then '900_999'
       else 'over 1000'
  end ReputationRange,
  count(*) as TotalWithinRange
FROM Users
JOIN Posts ON Users.Id = Posts.OwnerUserId 
JOIN PostTags ON PostTags.PostId = Posts.Id
JOIN Tags on Tags.Id = PostTags.TagId
WHERE PostTypeId = 1 and Posts.CreationDate > '9/1/2010'
Group by 
case when Reputation < 100    then 'lt_100'
       when Reputation >= 100 and Reputation < 200   then '100_199'
       when Reputation >= 200 and Reputation < 300   then '200_299'
       when Reputation >= 300 and Reputation < 400   then '300_399'
       when Reputation >= 400 and Reputation < 500   then '400_499'
       when Reputation >= 500 and Reputation < 600   then '500_599'
       when Reputation >= 600 and Reputation < 700   then '600_699'
       when Reputation >= 700 and Reputation < 800   then '700_799'
       when Reputation >= 800 and Reputation < 900   then '800_899'
       when Reputation >= 900 and Reputation < 1000  then '900_999'
       else 'over 1000'
end

Upvotes: 2

Related Questions