Pops
Pops

Reputation: 65

HtmlAgilityPack Nodes and Subnodes

I'm trying to parse complex html using HtmlAgilityPack

<tr>
<td width=0%>Artist:</td><td width=23% class='colour'><a href='/p/beatmaplist?q=Akiakane'>Akiakane</a></td>
<td width=0%>Circle Size:</td><td width=23% class='colour'><div class='starfield' style='width:140px'><div class='active' style='width:56px'></div></div></td>
<td width=0%>Approach Rate:</td><td class="colour"><div class='starfield' style='width:140px'><div class='active' style='width:126px'></div></div></td>
</tr>
<tr>
<td width=0%>Title:</td><td class="colour"><a href='/p/beatmaplist?q=FlashBack'>FlashBack</a></td>
<td width=0%>HP Drain:</td><td class="colour"><div class='starfield' style='width:140px'><div class='active' style='width:84px'></div></div></td>
<td width=0%><strong>Star Difficulty</strong>:</td><td width=23% class='colour'><div class='starfield' style='width:140px'><div class='active' style='width:72.9650211334px'></div></div> (5.21)</td>
</tr>
<tr>
<td width=0%>Creator:</td><td class="colour"><a href='/u/231111'>Kiiwa<a/></td>
<td width=0%>Accuracy:</td><td class="colour"><div class='starfield' style='width:140px'><div class='active' style='width:98px'></div></div></td>
<td width=0%>Length:</td><td class="colour">3:13 (2:49 drain)</td>
</tr>
<tr>
<td width=0%>Source:</td><td class="colour"><a href='/p/beatmaplist?q='></a></td>
<td width=0%>Genre:</td><td class="colour"><a href='/p/beatmaplist?g=4'>Rock</a> (<a href='/p/beatmaplist?la=3'>Japanese</a>)</td>
<td width=0%>BPM:</td><td class="colour">185</td>
</tr>
<tr>
<td width=0%>Tags:</td><td class="colour"><a href="/p/beatmaplist?q=j-pop">j-pop</a> <a href="/p/beatmaplist?q=beren">beren</a> <a href="/p/beatmaplist?q=collaboration">collaboration</a> <a href="/p/beatmaplist?q=collab">collab</a> <a href="/p/beatmaplist?q=boroboro">boroboro</a> <a href="/p/beatmaplist?q=na">na</a> <a href="/p/beatmaplist?q=ikizama">ikizama</a> <a href="/p/beatmaplist?q=niki">niki</a> <a href="/p/beatmaplist?q=niconicodouga">niconicodouga</a> <a href="/p/beatmaplist?q=toysfactory">toysfactory</a> </td>
<td width=0%>User Rating:</td><td class="colour">
<table width="100%" height="20px" style="color:#fff;">
<tr>
<td style="background-color:#BC2036;text-align:right;border:solid 1px #82000B;" width="3.37522441652">93</td>
<td style="background-color:#78AB23;text-align:left;border:solid 1px #718F0A;" width="96.6965888689">2,692</td>
</tr>

Each tr should be a object with the included td as properties. I. e.

public class SongInfo
        {
            public string CS { get; set; }
            public string AR { get; set; }
            public string HP { get; set; }
            public string STAR { get; set; }
            public string LENGTH { get; set; }
            public string BPM { get; set; }
        }

so, In this context, it should look like this:

CS should be "Circle Size: (starfield style % divided by active  style %)" 
AR should be "Approach Rate: (starfield style % divided by active style %)" 
HP should be "HP Drain: (starfield style % divided by active style %)" 
STAR should be "Star Difficulty: (starfield style % divided by active style %)" 
LENGTH should be "Length: 3:13"
BPM should be "BPM: 185"

When I say (starfield style % divided by active style %), I'm referring to this code:

<div class='starfield' style='width:140px'><div class='active'style='width:56px'></div>

So in that situation, it should be 2.5 since 140/56 = 2.5

My first thought was something like this:

foreach (HtmlAgilityPack.HtmlNode node in doc.DocumentNode.SelectNodes("//tr"))
{
     foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes("//td[@width]=0%"))
     {

     }
}

But honestly, I have no idea how to go with HtmlAgilityPack since I haven't really used it at all.

Is it possible to do what I'm asking?

Upvotes: 1

Views: 679

Answers (1)

Kent Kostelac
Kent Kostelac

Reputation: 2446

I don't think you tried hard enough, since you lietrally just copy pasted some code from this thread, without even trying to look into xpath.

A lot of the html code is similar. I made the entire solution for you. Please read it through thoroughly. Also read the Html Agility Pack documentation and about xpath. Your first initial Xpath is wrong. It is suppose to be: "//td[@width='0%']". You can suffice with "//td"(however the example below uses //td[@width='0%']") and then you must find the relevant using another method. In the solution below I used the innertext of each

public class SongInfo
{
    public string CS { get; set; }
    public string AR { get; set; }
    public string HP { get; set; }
    public string STAR { get; set; }
    public string LENGTH { get; set; }
    public string BPM { get; set; }
}

class MainClass
{
    public static void Main(string[] args)
    {
        SongInfo song = new SongInfo();

        HtmlDocument doc = new HtmlDocument();
        doc.Load("da.html");

        HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//td[@width='0%']");


        foreach (HtmlNode n in nodes)
        {
            if (n.InnerText.ToLower().Contains("circle size:"))
            {
                song.CS = n.InnerText+ " " + Convert.ToString(AlmostAnything(n.NextSibling));
            }
            if (n.InnerText.ToLower().Contains("approach rate:"))
            {
                song.AR = n.InnerText + " " + Convert.ToString(AlmostAnything(n.NextSibling));
            }
            if (n.InnerText.ToLower().Contains("hp drain:"))
            {
                song.HP = n.InnerText + " " + Convert.ToString(AlmostAnything(n.NextSibling));
            }
            if (n.InnerText.ToLower().Contains("star difficulty:"))
            {
                song.STAR = n.InnerText + " " + Convert.ToString(AlmostAnything(n.NextSibling));
            }
            if (n.InnerText.ToLower().Contains("length:"))
            {
                song.LENGTH = NextSiblingText(n);
            }
            if (n.InnerText.ToLower().Contains("bpm:"))
            {
                song.BPM = NextSiblingText(n);
            }

        }
        PrintSong(song);  
    }

    private static string NextSiblingText(HtmlNode n)
    {
        return n.NextSibling.InnerText;
    }

    private static int AlmostAnything(HtmlNode n)
    {
        string starfield="" , activefield = "";
        HtmlDocument temp = new HtmlDocument();
        temp.LoadHtml(n.InnerHtml);

        foreach (HtmlNode hN in temp.DocumentNode.SelectNodes("//div"))
        {
            if (hN.GetAttributeValue("class", "not found") == "starfield")
            {
                starfield = hN.GetAttributeValue("style", "style not found");
            }
            if (hN.GetAttributeValue("class", "not found") == "active")
            {
                activefield = hN.GetAttributeValue("style", "style not found");
            }
        }

        double result = ConvertStringToNum(starfield) / ConvertStringToNum(activefield);
        return Convert.ToInt32(result);
    }

    private static double ConvertStringToNum(string s)
    {
        string temp="";
        for (int i = 0; i < s.Length; i++)
        {
            if (Char.IsNumber(s[i]))
            {
                temp += s[i];
                for (i = i + 1; i < s.Length; i++)
                {
                    if (Char.IsNumber(s[i]))
                    {
                        temp += s[i];
                    }
                    else
                    {
                        return Convert.ToDouble(temp);
                    }
                }
            }
        }
        return -1;
    }

    private static void PrintSong(SongInfo s)
    {
        Console.WriteLine(s.CS);
        Console.WriteLine(s.AR);
        Console.WriteLine(s.HP);
        Console.WriteLine(s.STAR);
        Console.WriteLine(s.LENGTH);
        Console.WriteLine(s.BPM);
    }


}

Upvotes: 1

Related Questions