e1s
e1s

Reputation: 375

Replace part of large XML file

I have large XML file, and I need to replace elements with some name (and all inner elements) with another element. For example - if this element e:

<a>
<b></b>
<e>
   <b></b>
   <c></c>
</e>
</a>

After replace e for elem:

<a>
<b></b>
<elem></elem>
</a>

update: I try use XDocument but xml size more then 2gb and I have SystemOutOfMemoryException

update2: my code, but xml not transform

XmlReader reader = XmlReader.Create("xml_file.xml");
XmlWriter wr = XmlWriter.Create(Console.Out);
while (reader.Read())
   {
       if (reader.NodeType == XmlNodeType.Element && reader.Name == "e")
       {
           wr.WriteElementString("elem", "val1");
           reader.ReadSubtree();
       }
            wr.WriteNode(reader, false);
   }
wr.Close();

update 3:

<a>
<b></b>
<e>
   <b></b>
   <c></c>
</e>
<i>
  <e>
    <b></b>
    <c></c>
  </e>
</i> 
</a>

Upvotes: 2

Views: 2426

Answers (5)

Charles Mager
Charles Mager

Reputation: 26213

Taking inspiration from this blog post, you can basically just stream the contents of the XmlReader straight to the XmlWriter similarly to your example code, but handling all node types. Using WriteNode, as in your example code, will add the node and all child nodes, so you wouldn't be able to handle each descendant in your source XML.

In addition, you need to make sure you read to the end of the element you want to skip - ReadSubtree creates an XmlReader for this, but it doesn't actually do any reading. You need to ensure this is read to the end.

The resulting code might look like this:

using (var reader = XmlReader.Create(new StringReader(xml), rs))
using (var writer = XmlWriter.Create(Console.Out, ws))
{
    while (reader.Read())
    {
        switch (reader.NodeType)
        {
            case XmlNodeType.Element:
                var subTreeReader = reader.ReadSubtree();
                if (HandleElement(reader, writer))
                {
                    ReadToEnd(subTreeReader);
                }
                else
                {
                    writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
                    writer.WriteAttributes(reader, true);
                    if (reader.IsEmptyElement)
                    {
                        writer.WriteEndElement();
                    }
                }
                break;
            case XmlNodeType.Text:
                writer.WriteString(reader.Value);
                break;
            case XmlNodeType.Whitespace:
            case XmlNodeType.SignificantWhitespace:
                writer.WriteWhitespace(reader.Value);
                break;
            case XmlNodeType.CDATA:
                writer.WriteCData(reader.Value);
                break;
            case XmlNodeType.EntityReference:
                writer.WriteEntityRef(reader.Name);
                break;
            case XmlNodeType.XmlDeclaration:
            case XmlNodeType.ProcessingInstruction:
                writer.WriteProcessingInstruction(reader.Name, reader.Value);
                break;
            case XmlNodeType.DocumentType:
                writer.WriteDocType(reader.Name, reader.GetAttribute("PUBLIC"), reader.GetAttribute("SYSTEM"), reader.Value);
                break;
            case XmlNodeType.Comment:
                writer.WriteComment(reader.Value);
                break;
            case XmlNodeType.EndElement:
                writer.WriteFullEndElement();
                break;
        }
    }    
}

private static void ReadToEnd(XmlReader reader)
{
    while (!reader.EOF)
    {
        reader.Read();
    }
}

Obviously put whatever your logic is inside HandleElement, returning true if the element is handled (and therefore to be ignored). The implementation for the logic in your example code would be:

private static bool HandleElement(XmlReader reader, XmlWriter writer)
{
    if (reader.Name == "e")
    {
        writer.WriteElementString("element", "val1");
        return true;
    }

    return false;
}

Here is a working demo: https://dotnetfiddle.net/FFIBU4

Upvotes: 4

Arie
Arie

Reputation: 5373

// example data:
XDocument xmldoc = XDocument.Parse(
@"
<a>
<b></b>
<e>
   <b></b>
   <c></c>
</e>
<c />
<e>
   <b></b>
   <c></c>
   <c></c>
</e>
</a>
");
            // you can use xpath, then you need to add:
            // using System.Xml.XPath;
            List<XElement> elementsToReplace = xmldoc.XPathSelectElements("a/e").ToList();

            // or pure linq-to-sql:
            // elementsToReplace = xmldoc.Elements("a").Elements("e").ToList();

            foreach (XElement elem in elementsToReplace)
            {
                // setting Value of XElement to an empty string causes the resulting xml to look like this:
                // <elem></elem>
                // and not like this:
                // <elem />
                elem.ReplaceWith(new XElement("elem", ""));
                // if you don't mind self closing tags, then:
                // elem.ReplaceWith(new XElement("elem"));
            }

I didn't measure the performance but rumour has it the difference is not very significant.

XPath syntax, if you need it: http://www.w3schools.com/xpath/xpath_syntax.asp

Upvotes: 0

Sky Fang
Sky Fang

Reputation: 1101

string xml = @"<a>
<b></b>
<e>
<b></b>
<c></c>
</e>
</a>";
string patten = @"<e[^>]*>[\s\S]*?(((?'Open'<e[^>]*>)[\s\S]*?)+((?'-Open'</e>)[\s\S]*?)+)*(?(Open)(?!))</e>";
Console.WriteLine(Regex.Replace(xml,patten,"<ele></ele>"));

Use regex,also can use LinqToXml

Upvotes: 0

raduchept
raduchept

Reputation: 301

try this (saw the C# tag :D) :

        XElement elem = new XElement("elem");
        IEnumerable<XElement> listElementsToBeReplaced = xDocument.Descendants("e");
        foreach (XElement replaceElement in listElementsToBeReplaced)
        {
            replaceElement.AddAfterSelf(elem);
        }
        listElementsToBeReplaced.Remove();

Upvotes: 1

Hugo
Hugo

Reputation: 109

I would replace it with a regular expression, matching e elements with all its content and ending with the closing tag, and replacing it with the new elem element. This way you can do it in any editor with search/replace that supports regular expressions and programatically in any language.

Upvotes: 0

Related Questions