Reputation: 4830
I now have a table that has 604 000 row. I would like to drop 4000 random rows so my table will only contains 600 000 entries.
Would there be a quick way to do so ?
Thanks a lot.
Upvotes: 12
Views: 9770
Reputation: 115630
In theory, this will be random and fast. In practise, it will be only fast:
DELETE FROM tableX
LIMIT 4000
This will be random but terribly slow, with 600K rows:
DELETE FROM tableX
ORDER BY RAND()
LIMIT 4000
This won't be truly random (as there are usually gaps in the ids) and it may not even delete exactly 4000 rows (but a few less when there are many gaps) but it's probably faster than the previous.
An extra wrapping in a subquery is needed because the syntax for Delete from multiple tables does not allow LIMIT
:
DELETE td
FROM
tableX AS td
JOIN
( SELECT t.id
FROM
tableX AS t
CROSS JOIN
( SELECT MAX(id) AS maxid
FROM tableX
) AS m
JOIN
( SELECT RAND() AS rndm
FROM tableX AS tr
LIMIT 5000
) AS r
ON
t.id = CEIL( rndm * maxid )
LIMIT 4000
) AS x
ON
x.id = td.id
Explain output (of the subquery, from a 400K rows table):
id table possible_keys key_len rows
select_type type key ref Extra
1 PRIMARY <derived2> system 1
1 PRIMARY <derived3> ALL 5000
1 PRIMARY t eq_ref PRIMARY PRIMARY 4 func 1 Using where;Using index
3 DERIVED tr index PRIMARY 4 398681 Using index
2 DERIVED Select tables optimized away
Upvotes: 23
Reputation: 10636
If I had to venture a guess:
DELETE FROM table where id = (SELECT id FROM table ORDER BY rand() LIMIT 1) LIMIT 10
Upvotes: 2
Reputation: 432471
DELETE FROM TABLE ORDER BY RAND() LIMIT 4000;
It'll take time though...
A quicker way to execute (not write the code!) may be 4000 separate deletes in a loop
DELETE FROM TABLE WHERE AssumedPKisInt = <ARandomNumber>
Of course, you need to ensure you don't try to delete non-existing or already-deleted rows.
Upvotes: 0