Reputation: 19863
I have four tables that I am trying to join and output the result to a new table. My code looks like this:
create table tbl
select a.dte, a.permno, (ret - rf) f0_xs_ret, (xs_ret - (betav*xs_mkt)) f0_resid, mkt_cap last_year_mkt_cap, betav beta_value
from a inner join b using (dte)
inner join c on (year(a.dte) = c.yr and a.permno = c.permno)
inner join d on (a.permno = d.permno and year(a.dte)-1 = year(d.dte));
All of the tables have multiple indices and for table a
, (dte, permno)
identify a unique record, for table b
, dte
id's a unique record, for table c
, (yr, permno)
id a unique record and for table d
, (dte, permno)
id a unique record. the explain from the select
part of the query is:
+----+-------------+-------+--------+-------------------+---------+---------+---------- ------------------------+--------+-------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------+---------+---------+---------- ------------------------+--------+-------------------+
| 1 | SIMPLE | d | ALL | idx1 | NULL | NULL | NULL | 264129 | |
| 1 | SIMPLE | c | ref | idx2 | idx2 | 4 | achernya.d.permno | 16 | |
| 1 | SIMPLE | b | ALL | PRIMARY,idx2 | NULL | NULL | NULL | 12336 | Using join buffer |
| 1 | SIMPLE | a | eq_ref | PRIMARY,idx1,idx2 | PRIMARY | 7 | achernya.b.dte,achernya.d.permno | 1 | Using where |
+----+-------------+-------+--------+-------------------+---------+---------+----------------------------------+--------+-------------------+
Why does mysql have to read so many rows to process this thing? and if i am reading this correctly, it has to read (264129*16*12336)
rows which should take a good month.
Could someone please explain what's going on here?
Upvotes: 0
Views: 192
Reputation: 52372
MySQL has to read the rows because you're using functions as your join conditions. An index on dte
will not help resolve YEAR(dte)
in a query. If you want to make this fast, then put the year in its own column to use in joins and move the index to that column, even if that means some denormalization.
As for the other columns in your index that you don't apply functions to, they may not be used if the index won't provide much benefit, or they aren't the leftmost column in the index and you don't use the leftmost prefix of that index in your join condition.
Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.)
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
Upvotes: 2