Reputation: 8305
We need to import large amount of data(about 5 millions records) to the postgresql db under rails application. Data will be provided in xml format with images inside it encoded with Base64.
Estimated size of the xml file is 40GB. What xml parser can handle such amount of data in ruby?
Thanks.
Upvotes: 1
Views: 878
Reputation: 34281
You'll want to use some kind of SAX parser. SAX parsers do not load everything to memory at once.
I don't know about Ruby parsers but quick googling gave this blog post. You could start digging from there.
You could also try to split the XML file to smaller pieces to make it more manageable.
Upvotes: 3
Reputation: 107728
You could convert the data to CSV and then load it into your database by using your DBMS CSV loading capabilities. For MySQL it's this and for PostgreSQL it's this. I would not use anything built in Ruby to load a 40GB file, it's not too good with memory. Best left to the "professionals".
Upvotes: 1
Reputation: 2786
You should have use XML SAX parser as a Juha said. Libxml is the fastest xml lib for ruby, I think.
Upvotes: 1