Reputation: 31
I have a 1000M data table where i need to have a automated script just keeping last 7 days and delete the before days. I want to do it using python and chunks concept. Want to delete chunk wise.
do we have any library with this chunk concept related to mysql on python?
If no, can anyone suggest me a best method of how to use chunk or apply this with mysql
Upvotes: 1
Views: 450
Reputation: 3504
I'm unaware of a Python package that has an API for "chunking" deletes from a MySQL table. SqlAlchemy provides a fluent interface that can do this but it's not much different than the SQL. I suggest using PyMySql.
import datetime
import pymysql.cursors
connection = pymysql.connect(
host='host',
user='user',
password='password',
database='database'
)
seven_days_before_now = datetime.datetime.now() - datetime.timedelta(days=7)
chunksize = 1000
with connection.cursor() as cursor:
sql = 'DELETE FROM `mytable` WHERE `timestamp` < %s ORDER BY `id` LIMIT %s;'
num_deleted = None
while num_deleted != 0:
num_deleted = cursor.execute(sql, (seven_days_before_now, chunksize))
connection.commit()
The LIMIT
just limits the number of deleted rows to the chunksize
. The ORDER BY
ensures that the DELETE
is deterministic and it sorts by the primary key because the primary key is guaranteed to be indexed; so even though it sorts for each chunk, at least it's sorting on an indexed column. Remove the ORDER BY
if deterministic behavior is unnecessary, it will result in faster execution time.
You'll need to replace the connection details, table name, column name and chunksize
. Also, this solution assumes that the table has a column named id
which is the primary key and an auto-incrementing integer. You'll need to make some changes if your schema differs.
As Bernd Buffen commented: the correct way to get the behavior you desire is to partition the table. Please consider a migration to do so.
And, of course: stop using Python 2, it's been unsupported for almost two years as of the first version of this answer.
Upvotes: 2