Steini
Steini

Reputation: 2783

Mysql virtual column in WHERE

I have the following MySQL Query:

SELECT from_unixtime( `lastlogin` ) AS `last`
FROM `players` WHERE `lastlogin` <> 0 AND `last` < '2012-10-01 00:00:00';

And I get the following result:

1054 - Unknown column 'last' in 'where clause'

last cannot be used in the WHERE statement because this is ofcourse a virtual column and doesn't exist. "lastlogin" contains an UNIX Timestamp.

However is there a proper way to use a virtual column in a where statement?

Thanks for your help.

Upvotes: 2

Views: 5520

Answers (3)

spencer7593
spencer7593

Reputation: 108450

TL;DR

The query in the selected answer "should work"; performance is fine with trivial sets. But with large sets, that's going to be hellacious performance.

For best performance, we want to use a query that references the bare columnn in the range condition. Something like this:

SELECT from_unixtime( `lastlogin` ) AS `last`
  FROM `players`
 WHERE `lastlogin` > 0
   AND `lastlogin` < UNIX_TIMESTAMP('2012-10-01 00:00:00')

ORIGINAL ANSWER

The column alias last (in your query) cannot be referenced in the WHERE clause (of your query). This is a MySQL syntax rule. (The reason for the rule, basically, is that the expression referenced by that alias is not available at the time the predicates in the WHERE clause are processed.)

But that column alias CAN be referenced in a predicate in a HAVING clause. The HAVING clause gets processed almost at the end of the execution plan. This works, but is problematic from a performance standpoint:

SELECT from_unixtime( `lastlogin` ) AS `last`
  FROM `players` WHERE `lastlogin` <> 0 
 HAVING `last` < '2012-10-01 00:00:00';

The execution plan for that is nearly equivalent to using an inline view (to create a derived table) and running a SELECT against the derived table (which is also problematic from a performance standpoint):

SELECT d.*
  FROM (
         SELECT from_unixtime( `lastlogin` ) AS `last`
         FROM `players` WHERE `lastlogin` <> 0 
       ) d
WHERE `last` < '2012-10-01 00:00:00';

The column alias last can be referenced here, because the inner query (the inline view aliased as d) gets run first, and the resultset gets stored as a MyISAM table. Then the outer query runs against the MyISAM table. And in this query, the last is referencing a column name from a table, so it can be referenced in a WHERE clause.

IMPORTANT NOTE:

So that sort of answers the question you asked. But,

You don't really want to do that!

This approach will exhibit problematic performance for large sets, since this query is essentially creating a copy of the table, and then running a query on the copy. That from_unixtime function is going to be performed for EVERY row in the table (except for the rows where the lastlogin is NULL or equal to zero.)

From a performance standpoint, it's much better to use a single query (no inline views) and use a predicate on the bare column in the WHERE clause.

Ideally, lastlogin would be stored as DATETIME or TIMESTAMP datatype. If, however, that column is an integer, then it would be much better (performance wise) to convert the literal constant in the predicate into an integer, and then compare that to the bare column. This will allow MySQL to at least consider doing a range scan on an index that has lastlogin as a leading column, rather than performing the from_unixtime function on every row, to compare to the literal.

MySQL provides a convenient UNIX_TIMESTAMP() function for converting a DATETIME into an integer value (it's basically the inverse of the FROM_UNIXTIME() function, with a few caveats.

For best performance, we want a query of the form:

SELECT from_unixtime( `lastlogin` ) AS `last`
  FROM `players`
 WHERE `lastlogin` <> 0
   AND `lastlogin` < UNIX_TIMESTAMP('2012-10-01 00:00:00')

Upvotes: 5

Bill Karwin
Bill Karwin

Reputation: 562573

You can make reference to a column alias if you define the column alias in a subquery, but this is not efficient because the subquery has to generate an interim result set for all rows, just to have them discarded by the condition in the outer query. It would be more efficient to reduce the result set to matching rows as early as possible.

If your lastlogin column is in unixtime format, you could compare it against a constant expression in the WHERE clause:

SELECT FROM_UNIXTIME( `lastlogin` ) AS `last`
FROM `players` WHERE `lastlogin` < UNIX_TIMESTAMP('2012-10-01 00:00:00');

This can benefit from an index on that column and reduce the size of the result set.

I took out the lastlogin <> 0 because you shouldn't store zero dates anyway. If you're using zero as a special value that means "no value," then you should use NULL instead.


Re comment: yes, the correct function name is UNIX_TIMESTAMP(). Thanks for the correction, I've edited the above.

Upvotes: 2

Kermit
Kermit

Reputation: 34063

This should work:

SELECT a.`last`
FROM   (SELECT From_unixtime(`lastlogin`) AS `last` 
        FROM   `players` 
        WHERE  `lastlogin` <> 0) a 
WHERE  a.`last` < '2012-10-01 00:00:00'; 

Upvotes: 4

Related Questions