Reputation: 190
Hi I'm a server developer and we have a big mysql database(biggest table has about 0.5 billion rows) running 24-7.
And there's a lot of broken data. Most of them are logically wrong and involves multi-source(multiple tables, s3). And since it's kinda logically complicated, we need Rails model to clean them (can't be done with pure sql queries)
Right now, I am using my own small cleansing framework and using AWS Auto Scaling Group to scale up instances and speed up. But since the database is in-service, I have to be careful(table locks and other stuffs) and limit the process amount.
So I am curious about
Upvotes: 0
Views: 53
Reputation: 1885
So I have face a problem similar in nature to what you are dealing with but different in scale. this is how I would approach the situation.
Its possible to do this with out closing of the database for maintenance but I think you will get better results if you do. Also since this is a rails app I would look at the model validations that your app has and input field validations to prevent "broken data" in real time.
Upvotes: 1