Pomster
Pomster

Reputation: 15207

Xml to Text Convert

I would like to write something in C# that takes Xml and converts it to plain text.

<note>
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
</note>

Would become:

To Tove
From Jani
Heading Reminder
Body don't forget me this weekend!

Is there any thing already like this? and how would i go about doing this?

This is just ruffly the idea im going for still needs lots of work:

private void dataGridViewResult_SelectionChanged(object sender, EventArgs e)
        {
            if (this.dataGridViewResult.SelectedRows.Count > 0)
            {
                XslCompiledTransform xslt = new XslCompiledTransform();
                xslt.Load("SQL"); //.xml

                xslt.Transform("SQL","SQL"); //.xml, .html
                this.richTextBoxSQL.Text = this.dataGridViewResult.SelectedRows[0].Cells["SQL"].Value.ToString();
            }
        }

Upvotes: 6

Views: 18292

Answers (4)

ThunderGr
ThunderGr

Reputation: 2367

The answers given here work perfectly for simple XML files without any attributes. In order to handle an XML that may contain both attributes and values in the nodes, you need something more...complex. Not to mention "list items"(Items that have the same node name but different attributes, under the same group. e.g. <Books><Book ID="10" Author="Me" /><Book ID="20" Author="you"/></Books>). I needed to convert XML files to TAB delimited text, and I had to include all cases. The code has been tested on a few xml files having some thousands of nodes and is working. "List Items" are handled by appending each attribute's or element's value of the "list" to the existing column, using "|" as separator. I hope it can help others that want to do something similar and need a starting point. if someone is reading this and knows of a better way to do it, I would really like to hear about it.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;

public class FileProcessing
{
    static public void ConvertXmlFile(String theFile)
    {
        string theExportFile = theFile.Replace(".xml", ".txt");
        List<List<String>> theColumns = new List<List<string>>();
        List<String> theHeader = new List<string>();
        List<String> CurrentDepth = new List<string>();
        XmlDocument theXml = new XmlDocument();
        try
        {
            theXml.Load(theFile);
        }
        catch (Exception ex)
        {
            //handle as you seem fit. I use an error reporting system that is called here.
            return;
        }
        XmlElement theElement = theXml.DocumentElement;
        CurrentDepth.Add(theElement.ParentNode.Name);//Start at document level
        String ItemGroupTag = "";//this stores the element that groups all items(This element represents the complete line)
        String LastTagName = "";//detect if there is a listitem case(item with same tag appearing in a grouptag having different attribute values).
        int GroupItemDepth = 0;
        do
        {
            //null can occure on various errors. When it falls bellow the GroupItemDepth, the search is over
            if (theElement == null || CurrentDepth.Count <= GroupItemDepth) break;
            String CurrentTagName = theElement.ParentNode.Name + "." + theElement.Name;
            //only advance if this is not a listitem or the ItemGroupTag
            if (CurrentDepth[CurrentDepth.Count - 1] == CurrentTagName && LastTagName != CurrentTagName)
            {
                LastTagName = CurrentTagName;
                if (theElement.NextSibling != null)
                {
                    theElement = (XmlElement)theElement.NextSibling;
                    CurrentDepth[CurrentDepth.Count - 1] = LastTagName;
                    continue;
                }

                theElement = (XmlElement)theElement.ParentNode;
                CurrentDepth.RemoveAt(CurrentDepth.Count - 1);//I stepped one level up
                continue;
            }

            //only check the list until I get the group tag
            if (ItemGroupTag == "")
            {
                XmlNodeList theList = theXml.GetElementsByTagName(theElement.Name);
                if (theList.Count > 1)
                {
                    ItemGroupTag = CurrentTagName;
                    GroupItemDepth = CurrentDepth.Count;
                }
                else
                {
                    LastTagName = theElement.ParentNode.Name + "." + theElement.Name;
                    CurrentDepth.Add(LastTagName);//Stepped one level down
                    theElement = (XmlElement)theElement.FirstChild;
                    continue;
                }
            }
            //At this point I am in the GroupItem's entries
            //Check if the "line" has changed
            if (CurrentTagName == LastTagName && CurrentTagName == ItemGroupTag)
            {
                //It has changed. Make sure all columns have equal number of elements. If not, fill them with empty strings
                int MaxCount = 0;

                //First pass, find the number of items that should be in every column. Second pass, only one element apart in all occasions that
                //the count is not equal. Take in account that a column may appear out of nowhere, as an attribute, and fill it up to date
                for (int loop = 0; loop < theColumns.Count; loop++) if (theColumns[loop].Count > MaxCount) MaxCount = theColumns[loop].Count;
                for (int loop = 0; loop < theColumns.Count; loop++)
                {
                    //A problem will appear in cases that the second entry is a new attribute.
                    if (MaxCount > 2 && theColumns[loop].Count == 1)
                    {
                        while (theColumns[loop].Count != MaxCount) theColumns[loop].Insert(0, "");
                    }
                    else if (theColumns[loop].Count != MaxCount) theColumns[loop].Add("");
                }
                CurrentDepth.RemoveAt(CurrentDepth.Count - 1);//Remove the last entry of the group item
            }
            //Add a column for the tag, if there is none, or find its index, IF, this tag does not have child nodes
            int theColumnIndex = theHeader.IndexOf(CurrentTagName);
            if (theColumnIndex == -1 && !theElement.HasChildNodes)
            {
                theHeader.Add(CurrentTagName);
                theColumns.Add(new List<string>());
                theColumnIndex = theHeader.Count - 1;
            }

            if (theElement.HasAttributes)
            {
                XmlAttributeCollection theAttributes = theElement.Attributes;
                for (int loop = 0; loop < theAttributes.Count; loop++)
                {
                    theColumnIndex = theHeader.IndexOf(theElement.Name + "." + theAttributes[loop].Name);
                    if (theColumnIndex == -1)
                    {
                        theColumns.Add(new List<String>());
                        theHeader.Add(theElement.Name + "." + theAttributes[loop].Name);
                        theColumnIndex = theHeader.Count - 1;
                    }
                    if (theAttributes[loop].Value == null)
                    {
                        if (CurrentTagName == LastTagName && CurrentTagName != ItemGroupTag)
                            theColumns[theColumnIndex][theColumns[theColumnIndex].Count - 1] += "|";
                        else theColumns[theColumnIndex].Add("");
                    }
                    else
                    {
                        if (CurrentTagName == LastTagName && CurrentTagName != ItemGroupTag)
                            theColumns[theColumnIndex][theColumns[theColumnIndex].Count - 1] += "|" + theAttributes[loop].Value.ToString();
                        else theColumns[theColumnIndex].Add(theAttributes[loop].Value.ToString());
                    }
                }
            }
            else if (!theElement.HasChildNodes)
            {
                if (theElement.Value == null)
                {
                    if (CurrentTagName == LastTagName && CurrentTagName != ItemGroupTag) theColumns[theColumnIndex][theColumns[theColumnIndex].Count - 1] += "|";
                    else theColumns[theColumnIndex].Add("");//empty string for a null value
                }
                else
                {
                    if (CurrentTagName == LastTagName && CurrentTagName != ItemGroupTag) theColumns[theColumnIndex][theColumns[theColumnIndex].Count - 1] += "|" +
                        theElement.Value.ToString();
                    else theColumns[theColumnIndex].Add(theElement.Value.ToString());
                }
            }

            LastTagName = CurrentTagName;
            if (theElement.HasChildNodes)
            {
                theElement = (XmlElement)theElement.FirstChild;
                CurrentDepth.Add(LastTagName);//Went down a level
            }
            else if (theElement.NextSibling != null)
            {
                theElement = (XmlElement)theElement.NextSibling;
                //Prevent overwriting a previous group level when a list item occurs
                if (theElement.ParentNode.Name + "." + theElement.Name == LastTagName && CurrentDepth[CurrentDepth.Count - 1] != LastTagName)
                    CurrentDepth.Add(LastTagName);
                else CurrentDepth[CurrentDepth.Count - 1] = LastTagName;
            }
            else
            {
                theElement = (XmlElement)theElement.ParentNode;
                CurrentDepth.RemoveAt(CurrentDepth.Count - 1);//Went up a Level
            }
        } while (theElement != theXml.DocumentElement);
        //Put Everything together and write them to a file
        List<String> theLines = new List<string>();
        String theLine = "";
        for (int loop = 0; loop < theHeader.Count; loop++) theLine += "\t" + theHeader[loop];
        if (theLine != "") theLines.Add(theLine.Substring(1));

        int aMaxCount = 0;

        //Do a last check for row count consistency in columns
        for (int loop = 0; loop < theColumns.Count; loop++) if (theColumns[loop].Count > aMaxCount) aMaxCount = theColumns[loop].Count;
        for (int loop = 0; loop < theColumns.Count; loop++)
        {
            if (aMaxCount > 2 && theColumns[loop].Count == 1)
            {
                while (theColumns[loop].Count != aMaxCount) theColumns[loop].Insert(0, "");
            }
            else if (theColumns[loop].Count != aMaxCount) theColumns[loop].Add("");
        }

        for (int loop = 0; loop < theColumns[0].Count; loop++)
        {
            theLine = "";
            for (int loop1 = 0; loop1 < theColumns.Count; loop1++) theLine += "\t" + theColumns[loop1][loop].Replace('\t', ' ');
            if (theLine != "") theLines.Add(theLine.Substring(1));
        }

        File.WriteAllLines(theExportFile, theLines.ToArray(), Encoding.UTF8);
    }
}

Upvotes: 0

Steven Doggart
Steven Doggart

Reputation: 43743

The following XSLT script will do what you want:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>
    <xsl:template match="*">
        <xsl:for-each select="*">
            <xsl:value-of select="local-name()"/>
            <xsl:text> </xsl:text>
            <xsl:value-of  select="."/>
            <xsl:text>
</xsl:text>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

Then you can apply the script using the XslCompiledTransform class as such:

private string transformXml(string sourceXmlText, string xsltText)
{
    XmlDocument sourceXmlDocument = new XmlDocument();
    sourceXmlDocument.LoadXml(sourceXmlText);
    XslCompiledTransform transformer = new XslCompiledTransform();
    XmlTextReader xsltReader = new XmlTextReader(new StringReader(xsltText));
    transformer.Load(xsltReader);
    MemoryStream outputStream = new MemoryStream();
    XmlWriter xmlWriter = XmlWriter.Create(outputStream, transformer.OutputSettings);
    transformer.Transform(sourceXmlDocument, null, xmlWriter);
    outputStream.Position = 0;
    StreamReader streamReader = new StreamReader(outputStream);
    return streamReader.ReadToEnd();
}

It's obviously more complex than the other solutions people have posted, but it has the major advantage of being able to easily change the script if you need to change the formatting.

Upvotes: 6

Simon Mourier
Simon Mourier

Reputation: 139276

Something like this:

    XmlDocument doc = new XmlDocument();
    doc.LoadXml(your text string);

    StringBuilder sb = new StringBuilder();
    foreach (XmlNode node in doc.DocumentElement.ChildNodes)
    {
        sb.Append(char.ToUpper(node.Name[0]));
        sb.Append(node.Name.Substring(1));
        sb.Append(' ');
        sb.AppendLine(node.InnerText);
    }

    Console.WriteLine(sb);

Upvotes: 5

Tomtom
Tomtom

Reputation: 9394

You can do something like

    string xml = @"<note>
                    <to>Tove</to>
                    <from>Jani</from>
                    <heading>Reminder</heading>
                    <body>Don't forget me this weekend!</body>
                </note>";

    StringBuilder sb = new StringBuilder();
    foreach (XElement element in XDocument.Parse(XML-STRING).Descendants("note"))
    {
        sb.AppendLine(string.Format("{0} {1}", element.Name, element.Value));
    }

Upvotes: 1

Related Questions