RTRokzzz
RTRokzzz

Reputation: 235

Recursive searching a pattern in a string

I am using c#. I have following string

<li> 
    <a href="abc">P1</a> 
    <ul>
        <li><a href = "bcd">P11</a></li>
        <li><a href = "bcd">P12</a></li>
        <li><a href = "bcd">P13</a></li>
        <li><a href = "bcd">P14</a></li>
    </ul>
</li>
<li> 
    <a href="abc">P2</a> 
    <ul>
        <li><a href = "bcd">P21</a></li>
        <li><a href = "bcd">P22</a></li>
        <li><a href = "bcd">P23</a></li>
    </ul>
</li>
<li> 
    <a href="abc">P3</a> 
    <ul>
        <li><a href = "bcd">P31</a></li>
        <li><a href = "bcd">P32</a></li>
        <li><a href = "bcd">P33</a></li>
        <li><a href = "bcd">P34</a></li>
    </ul>
</li>
<li> 
    <a href="abc">P4</a> 
    <ul>
        <li><a href = "bcd">P41</a></li>
        <li><a href = "bcd">P42</a></li>
    </ul>
</li>

My aim is to fill the following list from the above string.

List<class1>

class1 has two properties,

string parent;
List<string> children;

It should fill P1 in parent and P11,P12,P13,P14 in children, and make a list of them.

Any suggestion will be helpful.

Edit

Sample

public List<class1> getElements()
{
    List<class1> temp = new List<class1>();
    foreach(// <a> element in string)
    {
        //in the recursive loop
        List<string> str = new List<string>();
        str.add("P11");
        str.add("P12");
        str.add("P13");
        str.add("P14");

        class1 obj = new class1("P1",str);
        temp.add(obj);
    }
    return temp;
}

the values are hard coded here, but it would be dynamic.

Upvotes: 7

Views: 797

Answers (3)

Tim Schmelter
Tim Schmelter

Reputation: 460148

If you can't use a third party tool like my recommended Html Agility Pack you could use the Webbrowser class and the HtmlDocument class to parse the HTML:

WebBrowser wbc = new WebBrowser();
wbc.DocumentText = "foo"; // necessary to create the document
HtmlDocument doc = wbc.Document.OpenNew(true);
doc.Write((string)html); // insert your html-string here
List<class1> elements = wbc.Document.GetElementsByTagName("li").Cast<HtmlElement>()
    .Where(li => li.Children.Count == 2)
    .Select(outerLi => new class1
    {
        parent = outerLi.FirstChild.InnerText,
        children = outerLi.Children.Cast<HtmlElement>()
            .Last().Children.Cast<HtmlElement>()
            .Select(innerLi => innerLi.FirstChild.InnerText).ToList()
    }).ToList();

Here's the result in the debugger window:

enter image description here

Upvotes: 3

Arie
Arie

Reputation: 5373

You can also use XmlDocument:

XmlDocument doc = new XmlDocument();
doc.LoadXml(yourInputString);
XmlNodeList colNodes = xmlSource.SelectNodes("li");
foreach (XmlNode node in colNodes)
{
    // ... your logic here
    // for example
    // string parentName = node.SelectSingleNode("a").InnerText;
    // string parentHref = node.SelectSingleNode("a").Attribures["href"].Value;
    // XmlNodeList children = 
    //       node.SelectSingleNode("ul").SelectNodes("li");
    // foreach (XmlNode child in children)
    // {
    //         ......
    // }
}

Upvotes: 1

slebetman
slebetman

Reputation: 113906

What you want is a recursive descent parser. All the other suggestions of using libraries are basically suggesting that you use a recursive descent parser for HTML or XML that has been written by others.

The basic structure of a recursive descent parser is to do a linear search of a list of tokens (in your case a string) and upon encountering a token that delimits a sub entity call the parser again to process the sublist of tokens (substring).

You can Google for the term "recursive descent parser" and find plenty of useful result. Even the Wikipedia article is fairly good in this case and includes an example of a recursive descent parser in C.

Upvotes: 4

Related Questions