unhappymeal
unhappymeal

Reputation: 55

More space-efficient way to retrieve counts from an enormous table

I am working with an enormous customer record table, (roughly 62 million customers,) and I need to retrieve counts for each customer. Presently, there is an Order table and a Unique Customers table. I perform a simple Inner Join to retrieve the records.

However, due to the size of the Orders table, the temporary file exceeds 15 GB. This is problematic, as the client has limited space available. Is there there a more space-efficient way to retrieve the counts than my current query?

SELECT A.CustomerID
  ,B.OrderID
  ,count(distinct B.OrderID) as Num_Orders
FROM UniqueCustomers as A
INNER JOIN Orders as B on A.CustomerID = B.CustomerID
GROUP BY A.CustomerID, B.OrderID

Thanks for your insight!

Upvotes: 3

Views: 107

Answers (1)

Aaron Bertrand
Aaron Bertrand

Reputation: 280262

No idea why you included OrderID in the output list - you are trying to get counts, right, not a list of 62 million OrderIDs? Additionally I am not sure why you need to apply DISTINCT to OrderID - your Orders table should not allow for duplicate values here; if it does, your schema is messed up.

Also it can be more efficient to use an EXISTS clause instead of a JOIN - however that can flip if you are using SELECT INTO.

SELECT CustomerID, Num_Orders = COUNT(OrderID)
  FROM dbo.Orders AS o
  WHERE EXISTS 
  (
    SELECT 1 FROM dbo.UniqueCustomers WHERE CustomerID = o.CustomerID
  )
  GROUP BY CustomerID;

Upvotes: 6

Related Questions