sparta93
sparta93

Reputation: 3854

XML like data to CSV Conversion

So I have a device which has an inbuilt logger program which generates status messages about the device and keeps pushing them to a .txt file. These messages include information about the device status, network status amongst many other things. The data in the file looks something like the following:

 <XML><DSTATUS>1,4,7,,5</DSTATUS><EVENT> hello,there,my,name,is,jack,</EVENT>
     last,name,missing,above <ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG> </XML>

 <XML><DSTATUS>1,5,7,,3</DSTATUS><EVENT>hello,there,my,name,is,mary,jane</EVENT>
     last,name,not,missing,above<ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG></XML>

    ... goes on

Note that it is not well formed XML. Also, one element can have multiple parameters and can also have blanks... for example: <NETWORKSTAT>1,456,3,6,,7</NETWORKSTAT> What my objective is is to write something in C# WPF, that would take this text file, process the data in it and create a .csv file with each event per line. For example, for the above given brief example, the first line in the csv file would be:

1,4,7,,5,hello,there,my,name,is,jack,,last,name,missing,above,3,6,7,,8,4

Also, I do not need help using basic C#. I know how to read a file, etc.. but I have no clue as to how I would approach this problem in regards to the parsing and processing and converting. I'm fairly new to C# so I'm not sure which direction to go. Any help will be appreciated!

Upvotes: 0

Views: 915

Answers (3)

jdweng
jdweng

Reputation: 34419

Due to non standard format had to switch from an XML Linq solution to a standard XML solution. Linq doesn't support TEXT strings that are not in tags.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.csv";
        static void Main(string[] args)
        {
            string input =
                "<XML><DSTATUS>1,4,7,,5</DSTATUS><EVENT> hello,there,my,name,is,jack,</EVENT>" +
                   "last,name,missing,above <ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG> </XML>" +

                "<XML><DSTATUS>1,5,7,,3</DSTATUS><EVENT>hello,there,my,name,is,mary,jane</EVENT>" +
                   "last,name,not,missing,above<ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG></XML>";

            input = "<Root>" + input + "</Root>";

            XmlDocument  doc = new XmlDocument();
            doc.LoadXml(input);

            StreamWriter writer = new StreamWriter(FILENAME);

            XmlNodeList rows = doc.GetElementsByTagName("XML");

            foreach (XmlNode row in rows)
            {
                List<string> children = new List<string>();
                foreach (XmlNode child in row.ChildNodes)
                {
                    children.Add(child.InnerText.Trim());
                }

                writer.WriteLine(string.Join(",", children.ToArray()));
            }

            writer.Flush();
            writer.Close();

        }
    }
}
​

Upvotes: 1

jdweng
jdweng

Reputation: 34419

Here is my solution that uses XML Linq. I create a XDocument by wrapping the fragments with a Root tag.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.csv";
        static void Main(string[] args)
        {
            string input =
                "<XML><DSTATUS>1,4,7,,5</DSTATUS><EVENT> hello,there,my,name,is,jack,</EVENT>" +
                   "last,name,missing,above <ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG> </XML>" +

                "<XML><DSTATUS>1,5,7,,3</DSTATUS><EVENT>hello,there,my,name,is,mary,jane</EVENT>" +
                   "last,name,not,missing,above<ANOTHERTAG>3,6,7,,8,4</ANOTHERTAG></XML>";

            input = "<Root>" + input + "</Root>";

            XDocument doc = XDocument.Parse(input);

            StreamWriter writer = new StreamWriter(FILENAME);

            List<XElement> rows = doc.Descendants("XML").ToList();

            foreach (XElement row in rows)
            {
                string[] elements = row.Elements().Select(x => x.Value).ToArray();
                writer.WriteLine(string.Join(",", elements));
            }

            writer.Flush();
            writer.Close();

        }
    }
}
​

Upvotes: 1

dbc
dbc

Reputation: 116794

Since each top-level XML node in your file is well-formed, you can use an XmlReader with XmlReaderSettings.ConformanceLevel = ConformanceLevel.Fragment to iterate through each top-level node in the file and read it with Linq-to-XML:

    public static IEnumerable<string> XmlFragmentsToCSV(string path)
    {
        using (var textReader = new StreamReader(path, Encoding.UTF8))
            foreach (var line in XmlFragmentsToCSV(textReader))
                yield return line;
    }

    public static IEnumerable<string> XmlFragmentsToCSV(TextReader textReader)
    {
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.ConformanceLevel = ConformanceLevel.Fragment;

        using (XmlReader reader = XmlReader.Create(textReader, settings))
        {
            while (reader.Read())
            {   // Skip whitespace
                if (reader.NodeType == XmlNodeType.Element) 
                {
                    using (var subReader = reader.ReadSubtree())
                    {
                        var element = XElement.Load(subReader);
                        yield return string.Join(",", element.DescendantNodes().OfType<XText>().Select(n => n.Value.Trim()).Where(t => !string.IsNullOrEmpty(t)).ToArray());
                    }
                }
            }
        }
    }

To precisely match the output you wanted I had to trim whitespaces at the beginning and end of each text node value.

Also, the Where(t => !string.IsNullOrEmpty(t)) clause is to skip the whitespace node corresponding to the space here: </ANOTHERTAG> </XML>. If that space doesn't exist in the real file, you can omit that clause.

Upvotes: 2

Related Questions