Max Malyk
Max Malyk

Reputation: 860

Flatten xml with text and element nodes using LINQ to XML

I need to process/flatten the incoming xml in a certain fashion.

Source xml:

<paragraph>
        This <content styleCode="Underline">is</content> 
        <content styleCode="Italic">
            <content styleCode="Underline">
                <content styleCode="Bold">hello</content> world 
            </content> test</content> <content styleCode="Bold">example</content> here.
    </paragraph>

Target xml:

<paragraph>
    This <content styleCode="underline">is</content> <content styleCode="italic underline bold">hello</content> <content styleCode="italic underline">world</content> <content styleCode="italic">test</content> <content styleCode="Bold">example</content> here.
</paragraph>

I would prefer to use LINQ to XML but realize the children text nodes next to content element nodes make it all different task now.

Another idea I had was to use regular expression to combine innerxml at every step by inserting closing </content> before child node and opening <content> immediately after it, update styleCode attributes accordingly, then AddBeforeSelf and remove the old node. I have not succeeded with this idea either.

Any ideas, solutions are very much appreciated.

Besides combining and flattening content nodes, I also have to lowercase the combined styleCode attributes, that's the easiest part obviously:

XDocument xml = XDocument.Parse(sourceXml);
XName contentNode = XName.Get("content", "mynamespace");
var contentNodes = xml.Descendants(contentNode);
var renames = contentNodes.Where(x => x.Attribute("styleCode") != null);
foreach (XElement node in renames.ToArray())
{
    node.Attribute("styleCode").Value = node.Attribute("styleCode").Value.ToLower();
}

Upvotes: 2

Views: 693

Answers (1)

SergeyS
SergeyS

Reputation: 3553

You can do it recursively - go from node to node collecting styles, when it comes to text, wrap it to the content tag with all tags found so far. Code below:

static void MergeStyles(string xml)
{
    XDocument doc = XDocument.Parse(xml);
    var desc = doc.Document.Elements();
    Go(doc.Root, new List<string>());
    Console.WriteLine(target);
}

static string target = "";
static void Go(XElement node, List<string> styles)
{
    foreach (var child in node.Nodes())
    {
        if (child.NodeType == XmlNodeType.Text)
        {
            if (styles.Count > 0)
            {
                target += string.Format(
                    "<content styleCode=\"{0}\">{1}</content>",
                    string.Join(" ", styles.Select(s => s.ToLower())),
                    child.ToString(SaveOptions.DisableFormatting));
            }
            else
            {
                target += child.ToString(SaveOptions.DisableFormatting);
            }
        }
        else if (child.NodeType == XmlNodeType.Element)
        {
            var element = (XElement)child;
            if (element.Name == "content")
            {
                string style = element.Attributes("styleCode").Single().Value;
                styles.Add(style);
                Go(element, styles);
                styles.RemoveAt(styles.Count - 1);
            }
        }
    }
}

Upvotes: 2

Related Questions