Reputation: 65
1) First query used ... which took around 23 secs
select a.id from mza_movie_upload a,mza_movie_statics b
where a.status=1 and b.download=1 and a.id=b.rid
group by b.rid order by sum(b.download) desc
Currently i modified the query ..which takes around 9 secs
select a.id from mza_movie_upload a
INNER JOIN mza_movie_statics b
ON a.id=b.rid WHERE a.status=1 and b.download=1
group by b.rid order by sum(b.download) desc
explain select a.id from mza_movie_upload a INNER JOIN mza_movie_statics b ON a.id=b.rid WHERE a.status=1 and b.download=1 group by b.rid order by sum(b.download) desc;
+----+-------------+-------+--------+---------------+---------+---------+----------------------+---------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+----------------------+---------+----------------------------------------------+ | 1 | SIMPLE | b | ALL | NULL | NULL | NULL | NULL | 1603089 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | mmdfurni_dev11.b.rid | 1 | Using where | +----+-------------+-------+--------+---------------+---------+---------+----------------------+---------+----------------------------------------------+ 2 rows in set (0.03 sec)
I am not sure what is the performance to be done ? I want this query to be fast .. I tried to index rid and id which still made the query worse.
Here is the table details
mza_movie_upload
+---------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | userid | varchar(200) | NO | | NULL | | | email | varchar(200) | NO | | NULL | | | up_date | datetime | NO | | NULL | | | file_size | varchar(200) | NO | | NULL | | | temp_filename | varchar(200) | NO | | NULL | | | fileneame | varchar(200) | NO | MUL | NULL | | | filepath | varchar(255) | NO | | NULL | | | status | varchar(20) | NO | | NULL | | | ip | varchar(200) | NO | | NULL | | | category | varchar(200) | NO | | NULL | | | mcode | bigint(20) | NO | | NULL | | | movie_name | varchar(200) | NO | | NULL | | +---------------+--------------+------+-----+---------+----------------+ 13 rows in set (0.00 sec)
mza_movie_statics
+-----------+---------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------+---------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | rid | int(11) | NO | | NULL | | | uid | int(11) | NO | | NULL | | | save | int(11) | NO | | NULL | | | download | int(11) | NO | | NULL | | | enterdate | date | NO | | NULL | | +-----------+---------+------+-----+---------+----------------+ 6 rows in set (0.00 sec)
Upvotes: 0
Views: 128
Reputation: 48129
Your query can be better optimized if you have what is considered a COVERING index. That is... the index has the columns associated with what you are looking for including criteria. This way the engine does not have to go to the raw data to actually check the respective status and download parts.
So, on mza_movie_upload have an index on (id, status) on mza_movie_statics have an index on (rid, download)
Next, the group by will work best on the index that is driving the query, and since a.id = b.rid, but a.id can be the driving index, let IT be the group by value.
select
mu.id
from
mza_movie_upload mu
JOIN mza_movie_statics ms
on mu.id = ms.rid
AND ms.download > 0
group by
b.rid
order by
sum( b.download ) DESC
Now, a comment on the download. It appears to be a numeric, so you probably don't want to compare to '1' explicitly as it appears that column is a counter for the number of times something was downloaded. And what you are looking for is the thing that was downloaded the most times. If this IS always a value of 1, then yes, leave that to = 1 instead of > 0.
Upvotes: 0
Reputation: 36087
Try to rewrite the query into:
SELECT b.rid
FROM mza_movie_upload a
INNER JOIN mza_movie_statics b
ON a.id=b.rid
WHERE a.status= '1' and b.download= '1'
-- group by b.rid order by sum(b.download) desc;
GROUP BY b.rid ORDER BY count(*) DESC;
In this query SELECT a.id
is replaced by SELECT b.rid
, and is 100% equivalent to the oryginal query because of JOIN ... ON a.id=b.rid
predicate, but leads MySql to slighly better plan
And, as @Dennis Leon sugested, a.status= '1' and b.download= '1'
are compared to strings, not numbers.
Try also to replace order by sum(b.download) desc
with order by count(*) desc
- since the query retrieves only rows with b.download = '1', then sum( b.download )
is equivalent to count(*)
- this change allows to save a few hundreds miliseconds on conversion from strings to numbers within SUM( .. )
.
In the end create two indexes:
create index bbbb on mza_movie_statics( download, rid );
create index aaaaa on mza_movie_upload( status );
then try the query speed after above changes.
Upvotes: 0
Reputation: 163
I would recommend applying an index on a.status and/or b.download if you wish to see further performance gains. Keep in mind creating additional indexes do come with additional overhead in regards to insert/update/deleting of records - in this case it seems arguably necessary.
Furthermore, before adding new indexes to these tables (presumably in your production environment) keep in mind mysql will create a temp copy of the table, which, for a table with lots of records (>1 million) may take a while. (so i would recommend testing locally on a table with similar size)
lastly, i noticed that in your query you have in your where clause: a.status=1 however the status column is a varchar. In order to avoid converting between two different data types (which slows down query execution time), and potentially breaking your future index, i would recommend changing it to: a.status='1' (notice the quotes)
Upvotes: 0