Reputation: 17
I have an orders table that contains the orders_id, customers_email_address and date_purchased. I want to write a SQL query that will, for each line in the table, add a new field called 'repeat_order_count' that shows how many times this customer ordered before and including this order.
For example, if John ordered once before this order, the repeat_order_count would be 2 for this order, or in other words, this is the second time John has ordered. The next order row I encounter for John will have a 3, and so on. This will allow me to create a line graph that shows the number of orders placed by repeat customers over time. I can now go to a specific time in the past and figure out how many orders were placed by repeat customers during that time period:
SELECT
*
FROM orders
WHERE repeat_order_count > 1
WHERE date_purchased = January 2014 --(simplifying things here)
I'm also able to determine now WHEN a customer became a repeat customer.
I can't figure out the query to solve this. Or perhaps there may be an easier way to do this?
Upvotes: 0
Views: 4152
Reputation: 108370
One approach to retrieving the specified result would be to use a correlated subquery in the SELECT list. This assumes that the customer identifier is customers_email_address
, and that date_purchased
is a DATETIME
or TIMESTAMP
(or other canonical format), and that there are no duplicated values for the same customer (that is, the customer doesn't have two or more orders with the same date_purchased
value.)
SELECT s.orders_id
, s.customers_email_address
, s.date_purchased
, ( SELECT COUNT(1)
FROM orders p
WHERE p.customers_email_address = s.customers_email_address
AND p.date_purchased < s.date_purchased
) AS previous_order_count
FROM orders s
ORDER
BY s.customers_email_address
, s.date_purchased
The correlated subquery will return 0 for the "first" order for a customer, and 1 for the "second" order. If you want to include the current order in the count, replace the <
comparison operator with <=
operator.
FOLLOWUP
For performance of that query, we need to be particulary concerned with the performance of the correlated subquery, since that is going to be executed for every row in the table. (A million rows in the table means a million executions of that query.) Having a suitable index available is going to be crucial.
For the query in my answer, I'd recommend trying an index like this:
ON orders (customers_email_address, date_purchased, orders_id)
With that index in place, we'd expect EXPLAIN
to show the index being used by both the outer query, to satisfy the ORDER BY
(No "Using filesort" in the Extra
column), and as a covering index (no lookups to the pages in the underlying table, "Using index" shown in the Extra
column.)
The answer I gave demonstrated just one approach. It's also possible to return an equivalent result using a join pattern, for example:
SELECT s.orders_id
, s.customers_email_address
, s.date_purchased
, COUNT(p.orders_id)
FROM orders s
JOIN orders p
ON p.customers_email_address = s.customers_email_address
AND p.date_purchased <= s.date_purchased
GROUP
BY s.customers_email_address
, s.date_purchased
, s.orders_id
ORDER
BY s.customers_email_address
, s.date_purchased
, s.orders_id
(This query is based on some additional information provided in a comment, which wasn't available before: orders_id
is UNIQUE in the orders
table.)
If we are guaranteed that orders_id
of a "previous" order is less than the orders_id
of a previous order, then it would be possible to use that column in place of the date_purchased
column. We'd want a suitable index available:
... ON orders (customers_email_address, orders_id, date_purchased)
NOTE: The order of the columns in the index is important. With that index, we could do:
SELECT s.orders_id
, s.customers_email_address
, s.date_purchased
, COUNT(p.orders_id)
FROM orders s
JOIN orders p
ON p.customers_email_address = s.customers_email_address
AND p.orders_id <= s.orders_id
GROUP
BY s.customers_email_address
, s.orders_id
ORDER
BY s.customers_email_address
, s.orders_id
Again, we'd want to review the output from EXPLAIN
to verify that the index is being used for both the join operation and the GROUP BY operation.
NOTE: With the inner join, we need to use a <=
comparison, so we get at least one matching row back. We could either subtract 1 from that result, if we wanted a count of only "previous" orders (not counting the current order), or we could use an outer join operation with a <
comparison, so we could get a row back with a count of 0.
Upvotes: 3
Reputation: 1462
when you are inserting into your orders table, for the column you have for your OrderCount
you use a co-related sub-query.
eg:
select
col1,
col2,
(isnull((select count(*) from orders where custID = @currentCustomer),0) + 1),
col4
Note that you wouldn't be adding the field when the 2nd order is processed, the field would already exist and you would just be populating it.
Upvotes: 0