Reputation: 706
I have an XML file that contains over 50 000 records (and the future ones might have up to 500 000 records). Each record has 3 levels - main level (used to distinguish records), common data level (tags contain attributes that define each record) and the third level contains the data specific for each record (mostly as attributes, but sometimes as inner text). My task is to "dissect" this file into multiple smaller files. There is an attribute on the third level that determines in which group does the whole record belong.
The algorithm should go like this:
For each record in the file:
So my question is what is the easiest (and most efficient way) to copy data into a new file? Keep in mind that I need to copy the entire record, not just some specific data. I'm working in C# using VS 2010.
Upvotes: 0
Views: 578
Reputation: 11396
The most efficient way (regarding performance) would be to have a single XmlReader
instance, going through your large file.
Since you have several groups that could be the destination, you should have multiple instances of XmlWriter
, which you would create on demand and store in a dictionary indexed by "group key", for the next iteration.
Using XmlReader
and XmlWriter
you avoid loading the entire file in memory.
To keep track of the nested levels you go through you could use a Stack
, pushing the items as you navigate inwards and popping as you navigate outwards, or just local variables in your method.
Don't forget to close your Stream
instances when you are done.
Upvotes: 1
Reputation: 2265
Through System.Xml
you can perform the operation. Create the List<XmlElement>
and cover your three levels of each.
XmlDocument doc = new XmlDocument();
doc.Load("Test.xml");
XmlElement root = doc.DocumentElement;
//Preform your read and write operation here
doc.Save("Test.xml");
Upvotes: 0
Reputation: 1038930
You could use a XmlReader
to progress through the nodes of the source file and once you encounter a node that meets your requirements simply read it and copy to a new file (The InnerXml
property of the current node will give you its entire string representation that you could store to a new file).
By the way if you expect your XML to grow to sizes of millions of records I would recommend you to anticipate this growth in advance and switch to a database which is more adapted for handling such volumes of data.
Upvotes: 1