Reputation: 375
I have large XML file, and I need to replace elements with some name (and all inner elements) with another element. For example - if this element e
:
<a>
<b></b>
<e>
<b></b>
<c></c>
</e>
</a>
After replace e
for elem
:
<a>
<b></b>
<elem></elem>
</a>
update: I try use XDocument
but xml size more then 2gb and I have SystemOutOfMemoryException
update2: my code, but xml not transform
XmlReader reader = XmlReader.Create("xml_file.xml");
XmlWriter wr = XmlWriter.Create(Console.Out);
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element && reader.Name == "e")
{
wr.WriteElementString("elem", "val1");
reader.ReadSubtree();
}
wr.WriteNode(reader, false);
}
wr.Close();
update 3:
<a>
<b></b>
<e>
<b></b>
<c></c>
</e>
<i>
<e>
<b></b>
<c></c>
</e>
</i>
</a>
Upvotes: 2
Views: 2426
Reputation: 26213
Taking inspiration from this blog post, you can basically just stream the contents of the XmlReader
straight to the XmlWriter
similarly to your example code, but handling all node types. Using WriteNode
, as in your example code, will add the node and all child nodes, so you wouldn't be able to handle each descendant in your source XML.
In addition, you need to make sure you read to the end of the element you want to skip - ReadSubtree
creates an XmlReader
for this, but it doesn't actually do any reading. You need to ensure this is read to the end.
The resulting code might look like this:
using (var reader = XmlReader.Create(new StringReader(xml), rs))
using (var writer = XmlWriter.Create(Console.Out, ws))
{
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
var subTreeReader = reader.ReadSubtree();
if (HandleElement(reader, writer))
{
ReadToEnd(subTreeReader);
}
else
{
writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
writer.WriteAttributes(reader, true);
if (reader.IsEmptyElement)
{
writer.WriteEndElement();
}
}
break;
case XmlNodeType.Text:
writer.WriteString(reader.Value);
break;
case XmlNodeType.Whitespace:
case XmlNodeType.SignificantWhitespace:
writer.WriteWhitespace(reader.Value);
break;
case XmlNodeType.CDATA:
writer.WriteCData(reader.Value);
break;
case XmlNodeType.EntityReference:
writer.WriteEntityRef(reader.Name);
break;
case XmlNodeType.XmlDeclaration:
case XmlNodeType.ProcessingInstruction:
writer.WriteProcessingInstruction(reader.Name, reader.Value);
break;
case XmlNodeType.DocumentType:
writer.WriteDocType(reader.Name, reader.GetAttribute("PUBLIC"), reader.GetAttribute("SYSTEM"), reader.Value);
break;
case XmlNodeType.Comment:
writer.WriteComment(reader.Value);
break;
case XmlNodeType.EndElement:
writer.WriteFullEndElement();
break;
}
}
}
private static void ReadToEnd(XmlReader reader)
{
while (!reader.EOF)
{
reader.Read();
}
}
Obviously put whatever your logic is inside HandleElement
, returning true
if the element is handled (and therefore to be ignored). The implementation for the logic in your example code would be:
private static bool HandleElement(XmlReader reader, XmlWriter writer)
{
if (reader.Name == "e")
{
writer.WriteElementString("element", "val1");
return true;
}
return false;
}
Here is a working demo: https://dotnetfiddle.net/FFIBU4
Upvotes: 4
Reputation: 5373
// example data:
XDocument xmldoc = XDocument.Parse(
@"
<a>
<b></b>
<e>
<b></b>
<c></c>
</e>
<c />
<e>
<b></b>
<c></c>
<c></c>
</e>
</a>
");
// you can use xpath, then you need to add:
// using System.Xml.XPath;
List<XElement> elementsToReplace = xmldoc.XPathSelectElements("a/e").ToList();
// or pure linq-to-sql:
// elementsToReplace = xmldoc.Elements("a").Elements("e").ToList();
foreach (XElement elem in elementsToReplace)
{
// setting Value of XElement to an empty string causes the resulting xml to look like this:
// <elem></elem>
// and not like this:
// <elem />
elem.ReplaceWith(new XElement("elem", ""));
// if you don't mind self closing tags, then:
// elem.ReplaceWith(new XElement("elem"));
}
I didn't measure the performance but rumour has it the difference is not very significant.
XPath syntax, if you need it: http://www.w3schools.com/xpath/xpath_syntax.asp
Upvotes: 0
Reputation: 1101
string xml = @"<a>
<b></b>
<e>
<b></b>
<c></c>
</e>
</a>";
string patten = @"<e[^>]*>[\s\S]*?(((?'Open'<e[^>]*>)[\s\S]*?)+((?'-Open'</e>)[\s\S]*?)+)*(?(Open)(?!))</e>";
Console.WriteLine(Regex.Replace(xml,patten,"<ele></ele>"));
Use regex,also can use LinqToXml
Upvotes: 0
Reputation: 301
try this (saw the C# tag :D) :
XElement elem = new XElement("elem");
IEnumerable<XElement> listElementsToBeReplaced = xDocument.Descendants("e");
foreach (XElement replaceElement in listElementsToBeReplaced)
{
replaceElement.AddAfterSelf(elem);
}
listElementsToBeReplaced.Remove();
Upvotes: 1
Reputation: 109
I would replace it with a regular expression, matching e elements with all its content and ending with the closing tag, and replacing it with the new elem element. This way you can do it in any editor with search/replace that supports regular expressions and programatically in any language.
Upvotes: 0