Matt
Matt

Reputation: 51

Edit very large xml files

I would like to create a text box which loads xml files and let users edit them. However, I cannot use XmlDocument to load since the files can be very large. I am looking for options to stream/load the xml document in chunks so that I do not get out of memory errors -- at the same time, performance is important too. Could you let me know what would be good options?

Upvotes: 5

Views: 2394

Answers (8)

csharptest.net
csharptest.net

Reputation: 64218

I've not tried it with files that big, but you should look at Microsoft's XML Notepad 2007. It claims under a second load times for a 3mb document.

http://www.microsoft.com/download/en/details.aspx?id=7973

Upvotes: 0

John Saunders
John Saunders

Reputation: 161783

I think you're trying to do too much in your text box here. Why not have the users edit the XML document in a tool that's meant for editing XML? Such a tool might even be able to handle large XML files.

Then, when the XML has been edited, the users can upload the complete XML document to your site.

Upvotes: 0

artur02
artur02

Reputation: 4479

You can use memory-mapped files to handle huge files. See MemoryMappedFile Class on MSDN. OK, it's low level, but can help. Available from .NET 4.

You can use readers with XMLNameTable support. It will enable string interning, so if a string appears frequently in a document the same string will represent it in memory for you.

You can also try a 3rd party XML parser. E.g. Altova XML is used in the company's products, so maybe it can do more things than the built-in .NET classes. BTW it's free.

Upvotes: 1

ScottE
ScottE

Reputation: 21630

Why bother with reading the xml into an xmldocument at all if all you're doing is pushing it into a textbox?

How big are you talking about here? Have you tried streaming it into a textbox yet?

sometextarea.Text = System.IO.File.ReadAllText(Server.MapPath("somexml.xml"));

Now, saving it back to the filesystem is a different story, especially if you want it to be 1. Valid xml and 2. Valid against a schema.

Upvotes: 2

War
War

Reputation: 8618

I've had simimlar issues doing this type of thing with CSV file data.

DRapp is right, it's likely the cleanest way to approach the situation assuming the user isn't expecting to read everything at root level in one hit.

In theory all you need to be careful of is what is open or closed but you can store just this core info in a string and it shouldn't be too bulky.

and as DRapp suggests you simply load the data in to a stream and with a bit of careful position management you should be able to read and write.

your biggest issue is that if say at point x you want to replace the data in node y with some data of a different length you would either end up with a gap in the file or you would overwrite the next node / a portion of it.

so each time a chnage is made essentially you need to stream the file in to another file up to the point where the edit starts then stream in the edit then stream in the rest of the file.

You should be able to do all this with stream reader and stream writer objects that you sit on top of 1 stream instance on the original file plus a stream writer on a second temp file.

It's never gonna be fast though, purely because writing updates to a 1.x gig file takes time on the hard drive and no optimising is gonna change that.

Upvotes: 1

code4life
code4life

Reputation: 15794

Try Scintilla.NET, it is miles better than a TextBox!

http://scintillanet.codeplex.com/

Loading the document is easy:

using (TextReader reader = new StreamReader(myFilePath, Encoding.UTF8))
{
    scintillaDocument.Text = reader.ReadToEnd();
}

Or:

scintillaDocument.Text = File.ReadAllText(myFilePath);

Upvotes: 2

DRapp
DRapp

Reputation: 48139

I too had to deal with large XML files (1+ gig) and had to parse elements out to import into a mySql database. I was successful by using a text-based stream reader. What I did was to keep reading in chunks until I had one complete single "record" of an xml based on the known

`<perRecordTag>`

   <other data / node elements>

`</perRecordTag>`

Then, I would do an XMLDocument to read from the string ( after the leading and trailing of the record ) stripped. I could then parse, review, whatever of that single record and move on.

Obviously, I had to retain all that after the end of xml record to start the beginning of the next read record element, but that was no problem.

Upvotes: 1

Carra
Carra

Reputation: 17964

You're probably looking for an XmlTextReader.

Upvotes: -2

Related Questions