sparta93
sparta93

Reputation: 3854

Removing line breaks from XML data before converting to CSV

So I'm currently using the following snippet in a C# WPF application to convert some XML data to CSV.

string text = File.ReadAllText(file);
text = "<Root>" + text + "</Root>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(text);
StreamWriter write = new StreamWriter(FILENAME1);
XmlNodeList rows = doc.GetElementsByTagName("XML");

foreach (XmlNode row in rows)
{
    List<string> children = new List<string>();

    foreach (XmlNode child in row.ChildNodes)
    {
        children.Add(child.InnerText.Trim());
    }

    write.WriteLine(string.Join(",", children.ToArray()));
}

However I've run into a situation. My input XML data looks something like the following (Sorry, you have to scroll horizontally to see how the data actually looks like in raw format):

<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
    tham ALL out. For some reason 
    that is not the case
    please press the on button 
    when trying to activate
    device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML> 

Now, the problem I'm encountering is that .. my output looks like this (given below); Since, it is a CSV file, I want the output to be in one single row, So how would I go about removing the line breaks from the raw data so the output is in a single horizontal line? I'm lost as to how I would approach this situation. Would Replace(System.Environment.NewLine, "") work? Any help will be appreciated!

1.0,770162,20121009133435,3,,20121009133435,721,5,1,0,0,0,00:00,00:00,,00032134826064957,4627,1,,1872161156,7,0,10000,1,0,5000000,0,10000000,0,1 ,,Keep it simple or spell
    tham ALL out. For some reason 
    that is not the case
    please press the on button 
    when trying to activate
    device codes also available on
list,,,20121009133435,00-1d-71-0a-71-80,-66,,,0,50 

EDIT:

Also note that my input file has several thousand lines like shown below:

<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
        tham ALL out. For some reason 
        that is not the case
        please press the on button 
        when trying to activate
        device codes also available on
    list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML> 
<XML><HEADER>2.0,773162,20121009133435,3,</HEADER>20121004133435,761,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,18735166156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
        tham ALL out. For some reason 
        that is not the case
        please press the on button 
        when trying to activate
        device codes also available on
    list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML> 

.. goes on 

Upvotes: 1

Views: 525

Answers (1)

sirdank
sirdank

Reputation: 3581

Try

children.Add(Regex.Replace(child.InnerText, "\\s+", " "));

This shouldn't depend on any specific newline character and will also get rid of the four spaces in between every line. \s is the regex for any whitespace and + means one or more occurrences.

Upvotes: 1

Related Questions