Reputation: 2533
I need to split am XML file (~400 MB) in two, so that a legacy app can process the file. At the moment its throwing an exception when the file is over around 300 MB.
As I can't change the app which is doing the processing, I thought I could write a console app to split the file in two first. What's the best way of doing this? It needs to be automated so I can't use a text editor, and I'm using C#.
I suppose the considerations are:
Any suggestions?
Upvotes: 1
Views: 1440
Reputation: 2694
You might want to consider making a full copy of the file and then deleting elements from each. You will have to decide at what level the deletions could occur.
It should then be fairly straightforward, from a count of how many elements have been deleted from FileA, to identify how many (and from what starting point) should be deleted from FileB.
Is that feasible for your circumstance?
I have put together the following to describe my thinking. It is not tested, but I would value the comments of the group. Downvote me if you want but I would prefer constructive criticism.
using System.Xml;
using System.Xml.Schema;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
SplitXML(args[0], args[1]);
}
private static void SplitXML(string fileNameA, string fileNameB)
{
int deleteCount;
XmlNodeList childNodes;
XmlReader reader;
XmlTextWriter writer;
XmlDocument doc;
// ------------- Process FileA
reader = XmlReader.Create(fileNameA);
doc = new XmlDocument();
doc.Load(reader);
childNodes = doc.DocumentElement.ChildNodes;
deleteCount = childNodes.Count / 2;
for (int i = 0; i < deleteCount; i++)
{
doc.DocumentElement.RemoveChild(childNodes.Item(0));
}
writer = new XmlTextWriter("FileC", null);
doc.Save(writer);
// ------------- Process FileB
reader = XmlReader.Create(fileNameB);
doc = new XmlDocument();
doc.Load(reader);
childNodes = doc.DocumentElement.ChildNodes;
for (int i = deleteCount + 1; i < childNodes.Count; i++)
{
doc.DocumentElement.RemoveChild(childNodes.Item(deleteCount +1));
}
writer = new XmlTextWriter("FileD", null);
doc.Save(writer);
}
}
}
Upvotes: 2
Reputation: 109140
The "best" way is likely to be based on XmlReader
and XmlWriter
. Using these "streaming" APIs avoids needing to load the whole XML object model in memory (and with DOM –XmlDocument
– that can need considerably more memory than the text data).
Using these APIs is harder than just loading the document: your implementation needs to track the context (eg. current node and ancestor list), but in this case that wouldn't be complex (just enough to open the elements to the current state when opening each output document).
Upvotes: 2
Reputation: 186078
If it's pure C#, running it as a 64-bit process might solve the problem for no effort at all (assuming you have a 64-bit Windows at hand).
Upvotes: 0