BeetleJuice
BeetleJuice

Reputation: 40886

SQL: Reuse function result in query without using sub-query

In a MySQL DB table that stores sale orders, I have a LastReviewed column that holds the last date and time when the sale order was modified (type timestamp, default value CURRENT_TIMESTAMP). I'd like to plot the number of sales that were modified each day, for the last 90 days, for a particular user.

I'm trying to craft a SELECT that returns the number of days since LastReviewed date, and how many records fall within that range. Below is my query, which works just fine:

SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND DATEDIFF(CURDATE(),LastReviewed)<=90
GROUP BY days
ORDER BY days ASC

Notice that I am computing the DATEDIFF() as well as CURDATE() multiple times for each record. This seems really ineffective, so I'd like to know how I can reuse the results of the previous computation. The first thing I tried was:

SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND days<=90
GROUP BY days
ORDER BY days ASC

Error: Unknown column 'days' in 'where clause'. So I started to look around the net. Based on another discussion (Can I reuse a calculated field in a SELECT query?), I next tried the following:

SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND (SELECT days)<=90
GROUP BY days
ORDER BY days ASC

Error: Unknown column 'days' in 'field list'. I'm also tried the following:

SELECT @days := DATEDIFF(CURDATE(), LastReviewed) AS days, 
       COUNT(*) AS number FROM sales
WHERE UserID=123 AND @days <=90
GROUP BY days
ORDER BY days ASC

The query returns zero result, so @days<=90 seems to return false even though if I put it in the SELECT clause and remove the WHERE clause, I can see some results with @days values below 90.

I've gotten things to work by using a sub-query:

SELECT * FROM (
  SELECT DATEDIFF(CURDATE(),LastReviewed) AS sales , 
         COUNT(*) AS number FROM sales
  WHERE UserID=123
  GROUP BY days
) AS t
WHERE days<=90
ORDER BY days ASC

However I odn't know whether it's the most efficient way. Not to mention that even this solution computes CURDATE() once per record even though its value will be the same from the start to the end of the query. Isn't that wasteful? Am I overthinking this? Help would be welcome.

Note: Mods, should this be on CodeReview? I posted here because the code I'm trying to use doesn't actually work

Upvotes: 1

Views: 946

Answers (2)

Rick James
Rick James

Reputation: 142218

Builtin functions are much less costly than, say, fetching rows.

You could get a lot more performance improvement with the following 'composite' index:

INDEX(UserID, LastReviewed)

and change to

WHERE UserID=123
  AND LastReviewed >= CURRENT_DATE() - INTERVAL 90 DAY

Your formulation is 'hiding' LastRevieded in a function call, making it unusable in an index.

If you are still not satisfied with that improvement, then consider a nightly query that computes yesterday's statistics and puts them in a "Summary table". From there, the SELECT you mentioned can run even faster.

Upvotes: 1

Michael - sqlbot
Michael - sqlbot

Reputation: 179004

There are actually two problems with your question.

First, you're overlooking the fact that WHERE precedes SELECT. When the server evaluates WHERE <expression>, it then already knows the value of the calculations done to evaluate <expression> and can use those for SELECT.

Worse than that, though, you should almost never write a query that uses a column as an argument to a function, since that usually requires the server to evaluate the expression for each row.

Instead, you should use this:

WHERE LastReviewed < DATE_SUB(CURDATE(), INTERVAL 90 DAY)

The optimizer will see this and get all excited, because DATE_SUB(CURDATE(), INTERVAL 90 DAY) can be resolved to a constant, which can be used on one side of a < comparison, which means that if an index exists with LastReviewed as the leftmost relevant column, then the server can immediately eliminate all of the rows with LastReviewed >= that constant value, using the index.

Then DATEDIFF(CURDATE(), LastReviewed) AS days (still needed for SELECT) will only be evaluated against the rows we already know we want.

Add a single index on (UserID, LastReviewed) and the server will be able to pinpoint exactly the relevant rows extremely quickly.

Upvotes: 2

Related Questions