Reputation: 2165
How to parse all the XML files under a given directory as an input to the application and write its output to a text file.
Note: The XML is not always the same the nodes in the XML can vary and have any number of Child-nodes.
Any help or guidance would be really helpful on this regard :)
XML File Sample
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>
<CNT>USA</CNT>
<CODE>3456</CODE>
</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<CD>
<TITLE>Hide your heart</TITLE>
<ARTIST>Bonnie Tyler</ARTIST>
<COUNTRY>UK</COUNTRY>
<COMPANY>CBS Records</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1988</YEAR>
</CD>
</CATALOG>
C# Code
using System;
using System.Collections.Generic;
using System.Windows.Forms;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Data;
using System.Xml;
using System.Xml.Linq;
namespace XMLTagParser
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Please Enter the Location of the file");
// get the location we want to get the sitemaps from
string dirLoc = Console.ReadLine();
// get all the sitemaps
string[] sitemaps = Directory.GetFiles(dirLoc);
StreamWriter sw = new StreamWriter(Application.StartupPath + @"\locs.txt", true);
// loop through each file
foreach (string sitemap in sitemaps)
{
try
{
// new xdoc instance
XmlDocument xDoc = new XmlDocument();
//load up the xml from the location
xDoc.Load(sitemap);
// cycle through each child noed
foreach (XmlNode node in xDoc.DocumentElement.ChildNodes)
{
// first node is the url ... have to go to nexted loc node
foreach (XmlNode locNode in node)
{
string loc = locNode.Name;
// write it to the console so you can see its working
Console.WriteLine(loc + Environment.NewLine);
// write it to the file
sw.Write(loc + Environment.NewLine);
}
}
}
catch {
Console.WriteLine("Error :-(");
}
}
Console.WriteLine("All Done :-)");
Console.ReadLine();
}
}
}
Preferred Output:
CATALOG/CD/TITLE
CATALOG/CD/ARTIST
CATALOG/CD/COUNTRY/CNT
CATALOG/CD/COUNTRY/CODE
CATALOG/CD/COMPANY
CATALOG/CD/PRICE
CATALOG/CD/YEAR
CATALOG/CD/TITLE
CATALOG/CD/ARTIST
CATALOG/CD/COUNTRY
CATALOG/CD/COMPANY
CATALOG/CD/PRICE
CATALOG/CD/YEAR
Upvotes: 2
Views: 137
Reputation: 318
This is a recursive problem, and what you are looking for is called 'tree traversal'. What this means is that for each child node, you want to look into it's children, then into that node's children (if it has any) and so on, recording the 'path' as you go along, but only printing out the names of the 'leaf' nodes.
You will need a function like this to 'traverse' the tree:
static void traverse(XmlNodeList nodes, string parentPath)
{
foreach (XmlNode node in nodes)
{
string thisPath = parentPath;
if (node.NodeType != XmlNodeType.Text)
{
//Prevent adding "#text" at the end of every chain
thisPath += "/" + node.Name;
}
if (!node.HasChildNodes)
{
//Only print out this path if it is at the end of a chain
Console.WriteLine(thisPath);
}
//Look into the child nodes using this function recursively
traverse(node.ChildNodes, thisPath);
}
}
And then here is how I would add it into your program (within your foreach sitemap
loop):
try
{
// new xdoc instance
XmlDocument xDoc = new XmlDocument();
//load up the xml from the location
xDoc.Load(sitemap);
// start traversing from the children of the root node
var rootNode = xDoc.FirstChild;
traverse(rootNode.ChildNodes, rootNode.Name);
}
catch
{
Console.WriteLine("Error :-(");
}
I made use of this other helpful answer: Traverse a XML using Recursive function
Hope this helps! :)
Upvotes: 2