Reputation: 3004
Ok, so I'm loading some data into table from another a join like so (where ts is a timestamp column):
insert ignore into target_table select s.user_id,i.item_id,i.artist_id,s.ts
from source_table s join itemlist i on s.item_url = i.item_url;
The thing is, I only want data from within in a certain time range, so my question is: Will it be faster to do this with a where clause on the join, i.e.:
insert ignore into target_table select s.user_id,i.item_id,i.artist_id,s.ts
from source_table s join itemlist i on s.item_url = i.item_url where
ts< '2013-01-01' and ts >= '2005-07-01';
Or, to do the first query, then afterwards do a delete, i.e.:
delete from target_table where ts>= '2013-01-01' and ts < '2005-07-01';
I know I could just test this myself, but I'd also like to understand why one method is preferred
Upvotes: 0
Views: 32
Reputation: 38238
(Assuming that you meant the second method's WHERE clause to actually be the opposite of the first's—otherwise you'll end up with a different result...)
It will depend on the indexes on the tables, and the amount of the data in the tables.
If both tables have an index on ts
, then I can't see it'll make a lot of difference, performance-wise, assuming not many rows, but it would be more usual to limit the initial SELECT
, so you only perform one operation.
The amount of data in the tables will be relevant. Think of this example: there's a billion rows in source_table
, but only one row that matches your criteria. In that case, an indexed SELECT
with a WHERE
clause will find the one row very quickly and copy it into the target table. However, the second method will copy a billion rows to the target table, which will take a long time (and a lot of disk space), and then you'll just delete all but one of them, which may take quite some time even if finding the row you want to keep is quite fast because of the index.
Inserting and then deleting may also lead to more "fragmented" data in the destination table, but that'll probably depend on the storage engine and I don't know enough about how MySQL handles things like that to give an opinion.
Other things to bear in mind: if you INSERT
and then DELETE
, there will be a period of time where the data in the table is "wrong". And if you're not wrapped in a transaction and something goes wrong in between the INSERT
and the DELETE
, it might stay wrong. If you just do the INSERT
with a WHERE
, it's a single atomic operation that will either succeed entirely or roll back.
Upvotes: 1
Reputation: 1270493
The first version should be faster:
insert ignore into target_table(user_id, item_id, artis_it, ts)
select s.user_id,i.item_id,i.artist_id,s.ts
from source_table s join
itemlist i
on s.item_url = i.item_url
where ts < '2013-01-01' and ts >= '2005-07-01';
Here are some reasons why:
Upvotes: 1