vijay
vijay

Reputation: 655

How do I convert this special case XML into CSV?

I have been trying to convert the following XML

<root>
<data>
  <MEMBNO>11400116</MEMBNO> 
  <BASIC>15400.00</BASIC> 
  <BASIC>15000.00</BASIC> 
  <BASIC>14242.50</BASIC> 
</data>
<data>
   <MEMBNO>200</MEMBNO>
   <DOB>17/02/1975</DOB>
</data>
<data>
   <MEMBNO>16</MEMBNO>
   <BASIC>26354.00</BASIC>
</data>
</root>

into CSV like this.

MEMBNO,BASIC,DOB,
11400116,"15400.00\v15000.00\v,14242.00",,
200,,17/12/1975,
16,26354.00,,

\v is a vertical tab and it is supposed to print a newline when the word mail merge happening.

Could you pleas help me out solving this?

Thank you.

I Haven't progressed anywhere. As of now I have no code with me.

Upvotes: 0

Views: 111

Answers (1)

har07
har07

Reputation: 89295

Assuming that only <BASIC> tag that can occur multiple times in one <DATA> tag, you can try as follow (see comments for what happen in detail, it is using XDocument and LINQ-to-XML) :

var xml = @"<root>
<data>
  <MEMBNO>11400116</MEMBNO> 
  <BASIC>15400.00</BASIC> 
  <BASIC>15000.00</BASIC> 
  <BASIC>14242.50</BASIC> 
</data>
<data>
   <MEMBNO>200</MEMBNO>
   <DOB>17/02/1975</DOB>
</data>
<data>
   <MEMBNO>16</MEMBNO>
   <BASIC>26354.00</BASIC>
</data>
</root>";
var doc = XDocument.Parse(xml);
var output = "";
//loop through each <data>
foreach(var data in doc.Descendants("data"))
{
    //get <MEMBNO> under current <data>
    var membno = (string)data.Element("MEMBNO");
    //get all <BASIC> under current <data>
    var basic = string.Join("\v", data.Elements("BASIC").Select(o => (string)o).ToArray());
    if (!string.IsNullOrEmpty(basic)) basic = "\"" + basic + "\"";
    //get <DOB> under current <data>
    var dob = (string)data.Element("DOB");
    //construct line with format MEMBNO,BASIC,DOB,
    var line = string.Format("{0},{1},{2},", membno, basic, dob);

    output += line + Environment.NewLine;
}
Console.WriteLine(output);

Codes above can be adjusted to accommodate unknown number of element and unknown element names under each <data>. Check updated code below, that will produce the same output as .csv sample in question without hardcoding element names (assume that doc variable is the same as in above code block) :

//get distinct list of element name under <data>
var columns = doc.Root.Elements("data").Elements().Select(o => o.Name.LocalName).Distinct().ToList();
var output = "";
//add header to 'output' : MEMBNO,BASIC,DOB,
columns.ForEach(o => output += o + ",");
foreach (var data in doc.Descendants("data"))
{
    var line = Environment.NewLine;
    foreach (var column in columns)
    {
        var cellValue = string.Join("\v", data.Elements(column).Select(o => (string)o).ToArray());
        if (!string.IsNullOrEmpty(cellValue)) cellValue = "\"" + cellValue + "\"";
        line += cellValue + ",";
    }
    output += line;
}
Console.WriteLine(output);

Upvotes: 2

Related Questions