Reputation: 3854
So I'm currently using the following snippet in a C# WPF application to convert some XML data to CSV.
string text = File.ReadAllText(file);
text = "<Root>" + text + "</Root>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(text);
StreamWriter write = new StreamWriter(FILENAME1);
XmlNodeList rows = doc.GetElementsByTagName("XML");
foreach (XmlNode row in rows)
{
List<string> children = new List<string>();
foreach (XmlNode child in row.ChildNodes)
{
children.Add(child.InnerText.Trim());
}
write.WriteLine(string.Join(",", children.ToArray()));
}
However I've run into a situation. My input XML data looks something like the following (Sorry, you have to scroll horizontally to see how the data actually looks like in raw format):
<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>
Now, the problem I'm encountering is that .. my output looks like this (given below); Since, it is a CSV file, I want the output to be in one single row, So how would I go about removing the line breaks from the raw data so the output is in a single horizontal line? I'm lost as to how I would approach this situation. Would Replace(System.Environment.NewLine, "")
work? Any help will be appreciated!
1.0,770162,20121009133435,3,,20121009133435,721,5,1,0,0,0,00:00,00:00,,00032134826064957,4627,1,,1872161156,7,0,10000,1,0,5000000,0,10000000,0,1 ,,Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list,,,20121009133435,00-1d-71-0a-71-80,-66,,,0,50
EDIT:
Also note that my input file has several thousand lines like shown below:
<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>
<XML><HEADER>2.0,773162,20121009133435,3,</HEADER>20121004133435,761,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,18735166156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>
.. goes on
Upvotes: 1
Views: 525
Reputation: 3581
Try
children.Add(Regex.Replace(child.InnerText, "\\s+", " "));
This shouldn't depend on any specific newline character and will also get rid of the four spaces in between every line. \s
is the regex for any whitespace and +
means one or more occurrences.
Upvotes: 1