Reputation: 365
As the title suggests I need to append log data to an XML file without buffering to RAM. The XML File is made up of LogEntry elements, which contain 82 child elements that contain data. These files can get quite large and seeing as it will form part of a Windows CE6 program we have very limited memory.
Having done a fair amount of research it's apparent that the most common methods are to use XDocument
or Linq to XML
to read in the existing document before appending to it and writing out the new document. Using XmlWriter
and XmlReader
in concert seems to be the best way for me to append to the file, but all my attempts so far are hugely impractical and require IF Statements to direct what to write in order to prevent duplicate or data less elements being written.
The essence of what I'm doing is:
//Create an XmlReader to read current WorkLog.
using (XmlReader xmlRead = XmlTextReader.Create("WorkLog.xml"))
{
//Create a XmlWriterSettings and set indent
//to true to correctly format the document
XmlWriterSettings writerSettings = new XmlWriterSettings();
writerSettings.Indent = true;
writerSettings.IndentChars = "\t";
//Create a new XmlWriter to output to
using (XmlWriter xmlWriter = XmlWriter.Create("New.xml", writerSettings))
{
//Starts the document
xmlWriter.WriteStartDocument();
//While the XmlReader is still reading (essentially !EOF)
while (xmlRead.Read())
{
//FSM to direct writing of OLD Log data to new file
switch (xmlRead.NodeType)
{
case XmlNodeType.Element:
//Handle the copying of an element node
//Contains many if statements to handle root node &
//attributes and to skip nodes that contain text
break;
case XmlNodeType.Text:
//Handle the copying of an text node
break;
case XmlNodeType.EndElement:
//Handle the copying of an End Element node
break;
}
}
xmlWriter.WriteEndDocument();
}
}
I'm confident I could append to the file this way but it is highly impractical to do so - does anyone know of any memory efficient methods that my hours of searching hasn't turned up?
I'm happy to post my current code to do this if required - but as I mentioned it is extremely large and is actually pretty nasty at the moment so I'll leave it out for now.
Upvotes: 4
Views: 2332
Reputation: 31116
Your approach to use XmlReader
is actually the way to go... but as you also say, it's very impractical.
So is a hack justified?
The reason for this is that XML has a bunch of features that you might encounter, which require you to read it from the top to the bottom. Normally XmlReader
copes with these situations, leaving you with plain tags and so on. For example, given the following declarations:
<!ENTITY % pub "Éditions Gallimard" >
<!ENTITY rights "All rights reserved" >
<!ENTITY book "La Peste: Albert Camus, © 1947 %pub;. &rights;" >
then the replacement text for the entity book
is:
La Peste: Albert Camus,
© 1947 Éditions Gallimard. &rights;
If you haven't read the ENTITY
tags, it's impossible to do the 'translation' to the correct XML. That said, fortunately there aren't a lot of people using these kinds of constructions, so it's okay assume your XML doesn't use them to rewrite the root tag.
That said, the only valid way in XML to close a tag is to use </Foo>
with optional spaces before the trailing >
. (see http://www.w3.org/TR/2008/REC-xml-20081126/#sec-starttags). This basically means you can skip to the end, read enough data, check if it contains the end tag - and if it does, you can insert your own code. If not, seek a bit back and try again.
Nasty little encodings
The last thing to be aware of is the encoding of your file. While you can construct an XmlTextReader
from a stream, the stream uses bytes and you reader detects the encoding and starts reading. Fortunately, XmlTextReader
exposes the Encoding
as property, so you can use that. Encoding is important because you might need more than just 1 byte for each character; especially when you encounter UTF-16 or UTF-32 this might be an issue. The way to handle this is to convert your token to bytes and then do the matching on bytes.
The root = the trailer assumption
Since I don't really feel like checking the spaces and trailing '>' (see W3C link above), I also assume it's a valid XML file, which means that every opening tag is closed as well. This means you can simply check for </root
, making the matching process a bit easier. (NOTE: you might even just check for the last </
in the file, but I prefer my code to be a bit more robust against incorrect XML)
Putting it all together
Here goes... (I haven't tested it, but it should more or less work)
public bool FindAppendPoint(Stream stream)
{
XmlTextReader xr = new XmlTextReader(stream);
string rootElement = null;
while (xr.Read())
{
if (xr.NodeType == XmlNodeType.Element)
{
rootElement = xr.Name;
break;
}
}
if (rootElement == null)
{
// Well, apparently there's no root... You can start a new file I suppose
return false;
}
else
{
long start = stream.Position; // the position we're currently reading (end of start tag)
long len = stream.Length;
long end = Math.Min(start, len - 1024);
byte[] endTag = xr.Encoding.GetBytes("</" + rootElement);
while (end >= start)
{
byte[] data = new byte[len - end];
stream.Seek(start, SeekOrigin.Begin);
stream.Read(data, 0, data.Length); // FIXME: read returns an int that we should use!!!
// Loop backwards till we find the end tag
for (int i = data.Length - endTag.Length; i >= 0; --i)
{
int j;
for (j = 0; j < endTag.Length && endTag[j] == data[i + j]; ++j) { }
if (j == endTag.Length)
{
// We found a match!
stream.Seek(len - data.Length - i, SeekOrigin.Begin);
AppendXml(stream, xr.Encoding)
return true;
}
}
// Hmm, we've found </xml with a lot of spaces... oh well
//
// It's okay to skip back a bit, just have to make sure that we don't skip <0
if (end == start)
{
end = start - 1; // end the loop
}
else
{
end = Math.Min(start, end - 1024);
}
}
// Nope, no go.
return false;
}
}
Upvotes: 1
Reputation: 2194
Assume that the log file is like so (only two levels):
<logs>
<Log>abc1</Log>
<Log>abc1</Log>
<Log>abc1</Log>
</logs>
I used FileStream
to seek end and to read the closing element.
private static void Append(string xmlElement)
{
const byte lessThan = (byte) '<';
using (FileStream stream = File.Open(@"C:\log.xml", FileMode.OpenOrCreate))
{
if (stream.Length == 0)
{
byte[] rootElement = Encoding.UTF8.GetBytes("<Logs></Logs>");
stream.Write(rootElement, 0, rootElement.Length);
}
List<byte> buffer = new List<byte>();
stream.Seek(0, SeekOrigin.End);
do
{
stream.Seek(-1, SeekOrigin.Current);
buffer.Insert(0, (byte) stream.ReadByte());
stream.Seek(-1, SeekOrigin.Current);
} while (buffer[0] != lessThan);
byte[] toAdd = Encoding.UTF8.GetBytes(xmlElement);
stream.Write(toAdd, 0, toAdd.Length);
stream.Write(buffer.ToArray(), 0, buffer.Count);
}
}
Upvotes: 1
Reputation: 8091
Only with XmlReader you can not load complete XML in memory. It's also doesn't support modification but you can copy XML from source document with modifications. There is no other way.
And parsing XML as a text document looks hard way.
It's better to use classes XmlReader/XmlWriter were parsing and crud logic is already implemented with your own classes implementation with use of either Visitor or State GoF patterns. Visitor pattern will reduce amount of if-s and will make your design easily extensible. And even when you want to parse XML document not using XmlReader/XmlWriter I recommend you to use them in this situation.
Upvotes: 1
Reputation: 28752
If a hack is justified, I would go to the end of the file, rewind past the end tag and write the new element and the end tag. For further improvement, you could even cache the offset of the beginning of last element.
Upvotes: 2
Reputation: 325
if you have known your xml structure, consider using stream writer. 1. open file as filestream 2. move point to the tag you wanna replace, like: , move your point(position) to the "<" 3. write your log data in the right xml format and write the "" at the end of write
"process the xml file with text editor"
Upvotes: 3