Reputation: 365

Copying & Appending an Element to an XML Document without buffering to RAM

As the title suggests I need to append log data to an XML file without buffering to RAM. The XML File is made up of LogEntry elements, which contain 82 child elements that contain data. These files can get quite large and seeing as it will form part of a Windows CE6 program we have very limited memory.

Having done a fair amount of research it's apparent that the most common methods are to use XDocument or Linq to XML to read in the existing document before appending to it and writing out the new document. Using XmlWriter and XmlReader in concert seems to be the best way for me to append to the file, but all my attempts so far are hugely impractical and require IF Statements to direct what to write in order to prevent duplicate or data less elements being written.

The essence of what I'm doing is:

//Create an XmlReader to read current WorkLog.
using (XmlReader xmlRead = XmlTextReader.Create("WorkLog.xml"))
{
   //Create a XmlWriterSettings and set indent 
   //to true to correctly format the document
   XmlWriterSettings writerSettings = new XmlWriterSettings();
   writerSettings.Indent = true;
   writerSettings.IndentChars = "\t";

   //Create a new XmlWriter to output to
   using (XmlWriter xmlWriter = XmlWriter.Create("New.xml", writerSettings))
   {
      //Starts the document
      xmlWriter.WriteStartDocument();

      //While the XmlReader is still reading (essentially !EOF)
      while (xmlRead.Read())
      {
         //FSM to direct writing of OLD Log data to new file
         switch (xmlRead.NodeType)
         {
            case XmlNodeType.Element:
               //Handle the copying of an element node
               //Contains many if statements to handle root node &  
               //attributes and to skip nodes that contain text
               break;
            case XmlNodeType.Text:
               //Handle the copying of an text node
               break;
            case XmlNodeType.EndElement: 
               //Handle the copying of an End Element node
               break;
         }
      }

      xmlWriter.WriteEndDocument();
   }
}

I'm confident I could append to the file this way but it is highly impractical to do so - does anyone know of any memory efficient methods that my hours of searching hasn't turned up?

I'm happy to post my current code to do this if required - but as I mentioned it is extremely large and is actually pretty nasty at the moment so I'll leave it out for now.

Upvotes: 4

Answers (5)

atlaste

Reputation: 31116

Your approach to use XmlReader is actually the way to go... but as you also say, it's very impractical.

So is a hack justified?

The reason for this is that XML has a bunch of features that you might encounter, which require you to read it from the top to the bottom. Normally XmlReader copes with these situations, leaving you with plain tags and so on. For example, given the following declarations:

<!ENTITY % pub    "&#xc9;ditions Gallimard" >
<!ENTITY   rights "All rights reserved" >
<!ENTITY   book   "La Peste: Albert Camus, &#xA9; 1947 %pub;. &rights;" >

then the replacement text for the entity book is:

La Peste: Albert Camus,
© 1947 Éditions Gallimard. &rights;

If you haven't read the ENTITY tags, it's impossible to do the 'translation' to the correct XML. That said, fortunately there aren't a lot of people using these kinds of constructions, so it's okay assume your XML doesn't use them to rewrite the root tag.

That said, the only valid way in XML to close a tag is to use </Foo> with optional spaces before the trailing >. (see http://www.w3.org/TR/2008/REC-xml-20081126/#sec-starttags). This basically means you can skip to the end, read enough data, check if it contains the end tag - and if it does, you can insert your own code. If not, seek a bit back and try again.

Nasty little encodings

The last thing to be aware of is the encoding of your file. While you can construct an XmlTextReader from a stream, the stream uses bytes and you reader detects the encoding and starts reading. Fortunately, XmlTextReader exposes the Encoding as property, so you can use that. Encoding is important because you might need more than just 1 byte for each character; especially when you encounter UTF-16 or UTF-32 this might be an issue. The way to handle this is to convert your token to bytes and then do the matching on bytes.

The root = the trailer assumption

Since I don't really feel like checking the spaces and trailing '>' (see W3C link above), I also assume it's a valid XML file, which means that every opening tag is closed as well. This means you can simply check for </root, making the matching process a bit easier. (NOTE: you might even just check for the last </ in the file, but I prefer my code to be a bit more robust against incorrect XML)

Putting it all together

Here goes... (I haven't tested it, but it should more or less work)

public bool FindAppendPoint(Stream stream)
{
    XmlTextReader xr = new XmlTextReader(stream);
    string rootElement = null;
    while (xr.Read())
    {
        if (xr.NodeType == XmlNodeType.Element)
        {
            rootElement = xr.Name;
            break;
        }
    }

    if (rootElement == null)
    {
        // Well, apparently there's no root... You can start a new file I suppose
        return false;
    }
    else
    {
        long start = stream.Position; // the position we're currently reading (end of start tag)
        long len = stream.Length;
        long end = Math.Min(start, len - 1024);

        byte[] endTag = xr.Encoding.GetBytes("</" + rootElement);

        while (end >= start)
        {
            byte[] data = new byte[len - end];
            stream.Seek(start, SeekOrigin.Begin);
            stream.Read(data, 0, data.Length); // FIXME: read returns an int that we should use!!!

            // Loop backwards till we find the end tag
            for (int i = data.Length - endTag.Length; i >= 0; --i)
            {
                int j;
                for (j = 0; j < endTag.Length && endTag[j] == data[i + j]; ++j) { }
                if (j == endTag.Length)
                {
                    // We found a match!
                    stream.Seek(len - data.Length - i, SeekOrigin.Begin);
                    AppendXml(stream, xr.Encoding)
                    return true;
                }
            }

            // Hmm, we've found </xml with a lot of spaces... oh well
            //
            // It's okay to skip back a bit, just have to make sure that we don't skip <0
            if (end == start)
            {
                end = start - 1; // end the loop
            }
            else
            {
                end = Math.Min(start, end - 1024);
            }
        }

        // Nope, no go.
        return false;
    }
}

Upvotes: 1

Ofir

Reputation: 2194

Assume that the log file is like so (only two levels):

<logs>
    <Log>abc1</Log>
    <Log>abc1</Log>
    <Log>abc1</Log>
</logs>

I used FileStream to seek end and to read the closing element.

private static void Append(string xmlElement)
{
    const byte lessThan = (byte) '<';
    using (FileStream stream = File.Open(@"C:\log.xml", FileMode.OpenOrCreate))
    {
        if (stream.Length == 0)
        {
            byte[] rootElement = Encoding.UTF8.GetBytes("<Logs></Logs>");
            stream.Write(rootElement, 0, rootElement.Length);
        }
        List<byte> buffer = new List<byte>();
        stream.Seek(0, SeekOrigin.End);
        do
        {
            stream.Seek(-1, SeekOrigin.Current);
            buffer.Insert(0, (byte) stream.ReadByte());
            stream.Seek(-1, SeekOrigin.Current);
        } while (buffer[0] != lessThan);

        byte[] toAdd = Encoding.UTF8.GetBytes(xmlElement);
        stream.Write(toAdd, 0, toAdd.Length);
        stream.Write(buffer.ToArray(), 0, buffer.Count);
    }
}

Upvotes: 1

Regfor

Reputation: 8091

Only with XmlReader you can not load complete XML in memory. It's also doesn't support modification but you can copy XML from source document with modifications. There is no other way.

And parsing XML as a text document looks hard way.

It's better to use classes XmlReader/XmlWriter were parsing and crud logic is already implemented with your own classes implementation with use of either Visitor or State GoF patterns. Visitor pattern will reduce amount of if-s and will make your design easily extensible. And even when you want to parse XML document not using XmlReader/XmlWriter I recommend you to use them in this situation.

Upvotes: 1

Miserable Variable

Reputation: 28752

If a hack is justified, I would go to the end of the file, rewind past the end tag and write the new element and the end tag. For further improvement, you could even cache the offset of the beginning of last element.

Upvotes: 2

Eugene

Reputation: 325

if you have known your xml structure, consider using stream writer. 1. open file as filestream 2. move point to the tag you wanna replace, like: , move your point(position) to the "<" 3. write your log data in the right xml format and write the "" at the end of write

"process the xml file with text editor"

Upvotes: 3

Copying &amp; Appending an Element to an XML Document without buffering to RAM

Answers (5)

Related Questions

Copying & Appending an Element to an XML Document without buffering to RAM