Reputation: 2139
I have a table with orders from different users. Each user (identified by an e-mail address) can have any number of orders. I don't have an extra table with users, just a plain OrderData table. What I want sounds simple:
where 1. is the sum of 2. and 3. "XYZ" users are defined by their e-mail address ending in "@xyz.com".
I want to have a single query, returning the three values in three columns. What I currently have is:
SELECT
(
SELECT COUNT(DISTINCT User_EmailAddress)
FROM OrderData
WHERE User_EmailAddress IS NOT NULL
AND RequestTime >= @RequestTimeFrom
AND RequestTime < @RequestTimeTo
) AS [Total Users],
(
SELECT COUNT(DISTINCT User_EmailAddress)
FROM OrderData
WHERE User_EmailAddress IS NOT NULL
AND User_EmailAddress LIKE '%@xyz.com'
AND RequestTime >= @RequestTimeFrom
AND RequestTime < @RequestTimeTo
) AS [XYZUsers],
(
SELECT COUNT(DISTINCT User_EmailAddress)
FROM OrderData
WHERE User_EmailAddress IS NOT NULL
AND User_EmailAddress NOT LIKE '%@xyz.com'
AND RequestTime >= @RequestTimeFrom
AND RequestTime < @RequestTimeTo
) AS [Non-XYZ Users]
It returns the correct result set:
Total Users | XYZ Users | Non-XYZ Users
------------+-----------+--------------
123 | 23 | 100
Is there a nicer way to write this query, instead of basically having three similar queries with the same code?
Upvotes: 0
Views: 1822
Reputation: 280252
Here is a solution that reduces redundant code and also avoid problems with multiple scans of the data. Since all you really care about is the counts, you don't even need to return any of the actual data, and you can collapse almost all of the logic into a single CASE
expression:
;WITH x AS
(
SELECT isxyz = CASE WHEN User_EmailAddress LIKE '%@xyz.com' THEN 1 ELSE 0 END
FROM dbo.OrderData
WHERE User_EmailAddress IS NOT NULL
AND RequestTime >= @RequestTimeFrom
AND RequestTime < @RequestTimeTo
GROUP BY User_EmailAddress
)
SELECT
TotalUsers = COUNT(isxyz),
XYZUsers = SUM(isxyz),
[Non-XYZ Users] = COUNT(NULLIF(isxyz,1))
FROM x;
The trick is to use the aggregates to your advantage. Since the 1/0
output is not a true BIT
column, you can SUM
it to get the count of 1
s. And the opposite can be achieved by changing the 1
s to NULL
, as NULL
s are ignored by COUNT
.
Upvotes: 2
Reputation: 1
The solution provided by bluefeet still has one drawback - each DISTINCT aggregation (in this case the COUNT) consumes a stream of input rows separately. This may cause separate scans / seeks of the base table - one set per distinct aggregation. Though sometimes the optimizer can spool the base table seek / scan into a work table and then compute the aggregates.
A more performance oriented way to do this (in this specific case) is to get the total count into a variable, then the xyz count into another variable, and then using scalar arithmetic deduce the third value (the non-xyz). That way you can avoid one spool replay (or even worse the base table seek / scan operator.)
Read this blog by Paul White for more information: http://web.archive.org/web/20170606142356/http://sqlblog.com/blogs/paul_white/archive/2011/12/04/is-distinct-aggregation-still-considered-harmful.aspx
Upvotes: 0