programmer
programmer

Reputation: 1094

Read multiple xml tables (under the same Root node) into DataTables/DataSet

I have an XML source document with multiple "report" nodes under the Root node. I need to read each "report" node into its own DataTable. It looks like I'll either need to transform my source XML data using an xsl stylesheet to get it in the format that'll work nicely or iterate through my xml elements like so:

namespace XmlParse2
{
    class Program
    {
        static IEnumerable<string> expectedFields = new List<string>() { "Field1", "Field2", "Field3", "Field4" };

        static void Main(string[] args)
        {
            string xml = @"<Root>
                             <Report1>
                               <Row>
                                 <Field1>data1-1</Field1>
                                 <Field2>data1-2</Field2>
                                 <Field4>data1-4</Field4>
                               </Row>
                               <Row>
                                 <Field1>data2-1</Field1>
                                 <Field2>data2-2</Field2>
                               </Row>
                             </Report1>
                             <Report2>
                               <Row>
                                 <Field1>data1-1</Field1>
                                 <Field4>data1-4</Field4>
                               </Row>
                               <Row>
                                 <Field1>data2-1</Field1>
                                 <Field3>data2-3</Field3>
                               </Row>
                             </Report2>
                           </Root>";

            DataTable report1 = new DataTable("Report1");
            report1.Columns.Add("Field1");
            report1.Columns.Add("Field2");
            report1.Columns.Add("Field3");
            report1.Columns.Add("Field4");

            DataTable report2 = new DataTable("Report2");
            report2.Columns.Add("Field1");
            report2.Columns.Add("Field2");
            report2.Columns.Add("Field3");
            report2.Columns.Add("Field4");

            var doc = XDocument.Parse(xml);
            var report1Data = doc.Root.Elements("Report1").Elements("Row").Select(record => MapRecord(record));
            var report2Data = doc.Root.Elements("Report2").Elements("Row").Select(record => MapRecord(record));

            report1 = addRows(report1, report1Data);
            report2 = addRows(report2, report2Data);

            Console.ReadLine();
        }

        public static Dictionary<string, string> MapRecord(XElement element)
        {
            var output = new Dictionary<string, string>();
            foreach (var field in expectedFields)
            {
                bool hasField = element.Elements(field).Any();
                if (hasField)
                {
                    output.Add(field, element.Elements(field).First().Value);
                }
            }
            return output;
        }

        public static DataTable addRows(DataTable table, IEnumerable<Dictionary<string, string>> data)
        {
            foreach (Dictionary<string, string> dict in data)
            {
                DataRow row = table.NewRow();

                foreach(var item in dict) 
                {
                    row[item.Key] = item.Value;
                }

                table.Rows.Add(row);
            }

            return table;
        }
    }
}

The problem with my source data not working seems to be that both Report1 and Report2 have child nodes that are named "Row" and my attempts to do stuff using DataSet.ReadXml is not successful because my code just groups all nodes named Row into one DataTable instead of separate DataTables. :/

What am I missing?

Upvotes: 1

Views: 4507

Answers (1)

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236328

XDocument xdoc = XDocument.Load(path_to_xml);
var tables = xdoc.Root.Elements()
                 .Select(report => {
                     DataTable table = new DataTable(report.Name.LocalName);
                     var fields = report
                            .Descendants("Row")
                            .SelectMany(row => row.Elements()
                                                  .Select(e => e.Name.LocalName))
                            .Distinct();

                     foreach(string field in fields)
                         table.Columns.Add(field);

                     foreach(var row in report.Descendants("Row"))
                     {
                         DataRow dr = table.NewRow();
                         foreach(var field in row.Elements())
                             dr[field.Name.LocalName] = (string)field;
                         table.Rows.Add(dr);
                     }                                   

                     return table;
                });

This query will return IEnumerable<DataTable>. Each datatable will contain only those columns, which have values in xml. Column names retrieved from xml and could be different for each table. For your sample structure will look this way:

DataTable: Report1
  Columns: Field1, Field2, Field4

DataTable: Report2
  Columns: Field1, Field3, Field4

All rows data will be added to each table.


You can extract some code to methods. It will make code easier to understand:

XDocument xdoc = XDocument.Load(path_to_xml);
var tables = xdoc.Root.Elements()
                 .Select(report => CreateTableFrom(report));

And methods:

private static DataTable CreateTableFrom(XElement report)
{
    DataTable table = new DataTable(report.Name.LocalName);
    table.Columns.AddRange(GetColumnsOf(report));

    foreach (var row in report.Descendants("Row"))
    {
        DataRow dr = table.NewRow();
        foreach (var field in row.Elements())
            dr[field.Name.LocalName] = (string)field;
        table.Rows.Add(dr);
    }

    return table;
}

private static DataColumn[] GetColumnsOf(XElement report)
{
    return report.Descendants("Row")
                 .SelectMany(row => row.Elements().Select(e => e.Name.LocalName))
                 .Distinct()
                 .Select(field => new DataColumn(field))
                 .ToArray();
}

Upvotes: 1

Related Questions