Reputation: 874
I am just now starting to normalize a 600GB database. We are moving it of Amazon RDS using sqldump to our local server (64GB RAM and 1,5TB SSD) where we will host the new normalized database.
The database consist of a single table that we are going to brake up into 4 different tables to remove all redundant data.
My question is: What is likely to be the best process?
Restoring all of the 600GB to a local temporary database and having a porting script query data from there and inserting into the new normalized structure.
Downloading a CSV dump instead of a .sql file and write a porting script that reads the CSV and feeds the date into the new normalized structure?
Or
For both solutions am tinging of building a caching script to deal with redundant foreign key look-up, but maybe mysql can handle that for be somehow?
Upvotes: 1
Views: 327
Reputation: 4193
It's usually easier and faster to work solely in a SQL database than with external files as well.
It gives you better performance, and also lets you easily keep the original column types if that's what you want to do. You might have to be quite careful about data types if you go via CSV, which is essentially just text.
So I'd prefer option 1, preferably testing the scripts on an extract of the original table that holds only a small fraction of the original records.
Although the space taken by the new database will be less, hopefully, than in the unnormalised one, if your 1.5 TB SSD is the only attached storage, it's going to be fairly full by the time the conversion is complete, and before you delete the original. Log files might be very large, depending on how you do it. You might want to stick a few extra TB of HDD in there temporarily.
If that's an issue, would doing the conversion entirely in the remote machine be feasible? Then you'd only need to download the converted database to the local server.
Upvotes: 3