Reputation: 706
I have a problem that I'm working on for quite some time now. I have an XML file with over 50000 records (one record has 3 levels). This file is used by one of my applications to control document sending (the record holds, among other informations, the type of document that has to be sent to a certain person). So in my application I load the XML file into a XmlDocument, and then by using SelectNodes method, I create a XmlNodeList from which I read the data I want. The process is like this - our worker takes the persons ID card (simple eith barcode) and reads it with barcode reader. When the barcode value has been read, my application finds the person with that ID in the XML file, and stores the type of the document into a string variable. Then the worker takes the document and reads its barcode, and if the value of documents barcode and the value in the value in the string variable match, the application makes a record that document of type xxxxxxxx will be sent to the person with ID yyyyyyyyy. This is very simple code, it works perfectly for now, and this is how it looks: On textBox1_TextChanged event (worker read persons ID):
foreach(XmlNode node in NodeList){
if(String.Compare(node.Attributes.GetNamedItem("ID").Value.ToString(),textBox1.Text)==0)
{
ControlString = node.ChildNode[3].FirstChild.Attributes.GetNamedItem("doctype").Value.ToString();
break;
}
}
textBox2.Focus();
And on textBox2_TextChanged event (worker read the documents barcode):
if(String.Compare(textBox2.Text,ControlString)==0)
{
//Create a record and insert it into a SQL database
}
My question is - how will my application perform with larger XML files (I was told that the XML file might be up to 500,000 records large), will this approach be valid, or will I need to cut the file into smaller files. If I have to cut it, please give me an idea with some code samples, I've tried to do it like this: Reading entire record and storing it into a string:
private void WriteXml(XmlNode record)
{
tempXML = record.InnerXml;
temp = "<" + record.Name + " code=\"" + record.Attributes.GetNamedItem("code").Value + "\">" + Environment.NewLine;
temp += tempXML + Environment.NewLine;
temp += "</" + record.Name + ">";
SmallerXMLDocument += temp + Environment.NewLine;
temp = "";
i++;
}
tempXML, temp and SmallerXMLDocument are all string variables.
And then in button_Click method I load the XML file into a XmlNodeList (again by using XmlDocument.SelectNodes method) and I try to create one big string value that would hold all records like this:
foreach(XmlNode node in nodes)
{
if(String.Compare(node.ChildNode[3].FirstChild.Attributes.GetNamedItem("doctype").Value.ToString(),doctype1)==0)
{
WriteXML(node);
}
}
My idea was to create a string value (in this case called SmallerXmlDocument), and when I pass trough the entire XML file, to simply copy the value of that string into a new file. This works, but only for files that have up to 2000 records (and my has way more than that). So, if I need to cut the file into smaller pieces, what would be the best way to do it (keep in mind that there could be up to half a million records in a XML file)?
Thanks
Upvotes: 2
Views: 2044
Reputation: 5480
What it comes down to is you need to access the data, so whether it's 50,000 rows in 1 file, or 1000 rows in 50 files, you've got the same amount of data.
There's nothing stopping you using something SQL-Lite or SQL Server Compact in your client. There are many benefits to this. You could use XMLReader to parse the data into tables in your DB. Having done that, you can now use the SQL Engine to find the rows you need, using joins to find the related rows much easier. You're also not storing vast amounts of data in memory. If the XML might change, then watch the file for changes & refresh the DB when it does.
Upvotes: 0
Reputation: 1712
First off, I suspect you're abusing the XML API. You can query the XmlDocument directly with XPath to get your result straight away, without first selecting a list of records and iterating over them. At no point should you need to convert parts of the XML tree to strings.
The approach of loading the entire XML document into memory will work just fine as long as you don't mind spending 50 to 500 megabytes of RAM on your application.
If you want to save RAM you should use XmlReader to stream the XML from disk.
Upvotes: 2