HadoopAddict
HadoopAddict

Reputation: 225

parse the xml by specifying the attribute names

I have a xml from which I want to parse only specific attributes not all. I have 100s of attributes and the xml I provided is a sample with few attributes . I want explicitly specify the attributes names and parse their values. Eg : I want to parse get the values of the Attribute names PersonN , VerifiedHuman In my logic I want to parse the values by specifying attribute names like <Name>PersonN</Name> and parse its value The result should be a csv.

<InterConnectResponse>
  <SchemaVersion>2.0</SchemaVersion>
  <ConsumerSubjects>
    <ConsumerSubject subjectIdentifier="Primary">
      <DataSourceResponses>
      <RiskViewProducts>
          <RiskViewAttribResponse>
          <Attributes>
                <Attribute>
                  <Name>PersonN</Name>
                  <Value>3</Value>
                </Attribute>
                <Attribute>
                  <Name>VerifiedHuman</Name>
                  <Value>2</Value>
                </Attribute>
                <Attribute>
                  <Name>CurrAddrBlockIndex</Name>
                  <Value>0.61</Value>
                </Attribute>
           ------ Many More Attributes ---------
         </Attributes>
         </RiskViewAttribResponse>
     </RiskViewProducts>
     </DataSourceResponses>
    </ConsumerSubject>
  </ConsumerSubjects>
</InterConnectResponse> 

Logic I am using : (I dont know how to specify the attribute names and get their values)In this code str3 is the above xml.

using (XmlReader read = XmlReader.Create(new StringReader(str3)))
{

    bool isValue = false;
    while (read.Read())
    {
        if (read.NodeType == XmlNodeType.Element && read.Name == "Value")
        {
            isValue = true;
        }

        if (read.NodeType == XmlNodeType.Text && isValue)
        {
            output.Append((output.Length == 0 ? "" : ", ") + read.Value);
            isValue = false;
        }
    }

}

Expected output :

3, 2

Upvotes: 0

Views: 622

Answers (2)

jdweng
jdweng

Reputation: 34421

It is easy to get all values in a dictionary. Then you can extract only the ones you want. Use xml linq

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;


namespace ConsoleApplication63
{
    class Program
    {
        const string XML_FILENAME = @"c:\temp\test.xml";
        const string CSV_FILENAME = @"c:\temp\test.csv";
        static void Main(string[] args)
        {
            XDocument doc = XDocument.Load(XML_FILENAME);

            Dictionary<string, string> dict = doc.Descendants("Attribute")
                .GroupBy(x => (string)x.Element("Name"), y => (string)y.Element("Value"))
                .ToDictionary(x => x.Key, y => y.FirstOrDefault());

            StreamWriter writer = new StreamWriter(CSV_FILENAME);


            string[] attributesToRead = new[] { "CurrAddrTaxValue", "CurrAddrTaxMarketValue", "PrevAddrTaxValue" };
            //foreach (string attribute in attributesToRead)
            //{
            //    writer.WriteLine(string.Join(",", new string[] { attribute, dict[attribute] }));
            //}

            //all on one line

            string output = string.Join(",", attributesToRead.Select(x => dict[x]).ToArray());
            writer.WriteLine(output);

            writer.Flush();
            writer.Close();
        }
    }

}

Upvotes: 1

NtFreX
NtFreX

Reputation: 11357

If you want to group your attributes for example by product you could do the following.

var document = XDocument.Load(fileName); // or `= XDocument.Parse(xml);`
var attributesToRead = new[] {"PersonN", "VerifiedHuman"};
var productsElements = document.XPathSelectElements("InterConnectResponse/ConsumerSubjects/ConsumerSubject/DataSourceResponses/RiskViewProducts");
var products = productsElements.Select(product => new
{
    Attributes = product.XPathSelectElements("RiskViewAttribResponse/Attributes/Attribute").Select(attribute => new
    {
        Name = attribute.Element("Name")?.Value,
        Value = attribute.Element("Value")?.Value
    }).Where(attribute => attributesToRead.Contains(attribute.Name))
});

To get the desired output you can do this.

foreach (var product in products)
{
    foreach (var attribute in product.Attributes)
    {
        Console.WriteLine(attribute.Value + ", ");
    }
}

To create an csv I recommend you use a library like CsvHelper.

using (var writer = new StreamWriter(new FileStream(@"C:\mypath\myfile.csv", FileMode.Append)))
{
    var csv = new CsvWriter(writer);
    csv.Configuration.Delimiter = ",";
    foreach (var product in products)
    {
        foreach (var attribute in product.Attributes)
        {
            csv.WriteField(attribute.Value);
        }
        csv.NextRecord();
    }
}

Upvotes: 1

Related Questions