user2087008
user2087008

Reputation:

HTMLAgilityPack and XPath Target

I have the following HTML:

<table>
    <tr>
        <td><a href="#">Tournament Name</a>
            <br /> Tournament Address </td>
    </tr>

    <tr>
        <td><a>View Available Space and Book Online</a></td>
    </tr>

    <tr>
        <td>
            <em>Event Cost:</em> $$$
        </td>

        <td> Date and Time </td>
    </tr>

    <tr>

        <td>
            <p>
                <strong>
                    <img title="Boy's Teams can enter this tournament" />
                    <img  title="Girl's Teams can not enter this tournament" />
                    <img  title="Disabled Teams can not enter this tournament" />
                </strong>
            </p>
        </td>

        <td>
            TimeFrame
        </td>

    </tr>

     <tr>
       <td>
            <img src="image.gif" />
            <img src="image.gif" />
            <img src="image.gif" />
            <img src="image.gif" />
            <img src="image.gif" />
            <img src="image.gif" />
            <img src="image.gif" />
            <img src="image...." />
            <img src="image...." />
            <img src="image...." />
            <img src="image...." />
        </td>
    </tr>
</table>

(This table is repeated many times on the page).

I'm trying to extract the Tournament Name.

I have the following C# code:

namespace AcademyScraper
{
    public partial class Main : Form
    {
        public Main()
        {
            InitializeComponent();
        }


        private void saveBtn_Click(object sender, EventArgs e)
        {

            string url = "http://www.reddishvulcans.com/uk_tournament_database.asp";
            var Webget = new HtmlWeb();
            var doc = Webget.Load(url);

            var root = doc.DocumentNode;
            var nodes = root.Descendants();

            HtmlNodeCollection tableCollection = doc.DocumentNode.SelectNodes("//div[@class='infobox']/table");

            for (Int32 i = 0; i < tableCollection.Count(); i++)
            {
            HtmlNode tournamentName = tableCollection[i].SelectSingleNode("/tr[1]/td/a");

            MessageBox.Show(tournamentName.InnerText);
            // I get an exception here

            }

        }


    }
}

The problem I'm having, is that no matter what I try I can't seem to target the tag containing the tournament name. If I do MessageBox.Show(tableCollection[i].OuterHTML);, the table contents will be rendered fine inside the messagebox with no problems. However, I get a reference exception whenever I try to get the tournamentName. Based on the HTML I think it should be right.

Upvotes: 1

Views: 214

Answers (3)

har07
har07

Reputation: 89285

The following XPath seems to work fine for me :

//div[@class='infobox']/table/tr/td[br]/a

Console application demo :

string url = "http://www.reddishvulcans.com/uk_tournament_database.asp";
var Webget = new HtmlWeb();
var doc = Webget.Load(url);

//print top 10 result just for the sake of demo
var result = doc.DocumentNode
                .SelectNodes("//div[@class='infobox']/table/tr/td[br]/a")
                .Take(10);
foreach (HtmlNode node in result)
{
    Console.WriteLine(node.InnerText);
}

output :

The North West Junior Champions League 2016
PLAY AT CHELSEA - STAMFORD BRIDGE FOOTBALL TOURNAMENT 2016
PLAY AT FC BARCELONA -  CAMP NOU FOOTBALL TOUR 2016 - THE EUROPA CUP
Silverdale Soccersevens XIX
NORTH HALIFAX MINI SOCCER TOURNAMENT 2016
Halton & District JFL Mini Soccer Tournament
Colwyn Bay FC Junior Tournament
GMCJFC Pat Mangan Festival of Football 2016
Fred England Trophy
Fred England Trophy

Upvotes: 1

Onur Keskin
Onur Keskin

Reputation: 140

Maybe you can try something like this (i created a console app to try):

 private void saveBtn_Click(object sender, EventArgs e)
    {

        string url = "http://www.reddishvulcans.com/uk_tournament_database.asp";
        var Webget = new HtmlWeb();
        var doc = Webget.Load(url);
        var aTags = doc.DocumentNode.SelectNodes("//div[@class='infobox']/table/tr/td[1]/a");

        foreach (var tag in aTags)
        {
            Console.WriteLine(tag.InnerText);
        }

        Console.ReadLine();
    }

Upvotes: 1

hoangdv
hoangdv

Reputation: 16127

You have a task work with network var doc = Webget.Load(url); it can make some times, but you got it in main thread -> conflict. You need run network task in other thread. NOTE MessageBox.Show(tournamentName.InnerText); is UI thread(main thread) you should run it in INVOKE delegate.

Upvotes: 0

Related Questions