Reputation: 55
I am working with an enormous customer record table, (roughly 62 million customers,) and I need to retrieve counts for each customer. Presently, there is an Order table and a Unique Customers table. I perform a simple Inner Join to retrieve the records.
However, due to the size of the Orders table, the temporary file exceeds 15 GB. This is problematic, as the client has limited space available. Is there there a more space-efficient way to retrieve the counts than my current query?
SELECT A.CustomerID
,B.OrderID
,count(distinct B.OrderID) as Num_Orders
FROM UniqueCustomers as A
INNER JOIN Orders as B on A.CustomerID = B.CustomerID
GROUP BY A.CustomerID, B.OrderID
Thanks for your insight!
Upvotes: 3
Views: 107
Reputation: 280262
No idea why you included OrderID
in the output list - you are trying to get counts, right, not a list of 62 million OrderID
s? Additionally I am not sure why you need to apply DISTINCT
to OrderID
- your Orders
table should not allow for duplicate values here; if it does, your schema is messed up.
Also it can be more efficient to use an EXISTS
clause instead of a JOIN
- however that can flip if you are using SELECT INTO
.
SELECT CustomerID, Num_Orders = COUNT(OrderID)
FROM dbo.Orders AS o
WHERE EXISTS
(
SELECT 1 FROM dbo.UniqueCustomers WHERE CustomerID = o.CustomerID
)
GROUP BY CustomerID;
Upvotes: 6