Naveed Jawaid
Naveed Jawaid

Reputation: 3

Extract Specific Text from Html Page

Html page is look like this

<tr>
<th rowspan="4" scope="row">General</th>
<td class="ttl"><a href="network-bands.php3">2G Network</a></td>
<td class="nfo">GSM 850 / 900 / 1800 / 1900 </td>
</tr><tr>
<td class="ttl"><a href="network-bands.php3">3G Network</a></td>
<td class="nfo">HSDPA 900 / 1900 / 2100 </td>
</tr>

for that i am try to use

var text = document.getElementsByClassName("nfo")[0].innerHTML;

Provided By Alex

But i am getting this error Error 2 The name 'document' does not exist in the current context C:\Users\Nabi Javid\Documents\Visual Studio 2008\Projects\WpfApplication2\WpfApplication2\Window1.xaml.cs 30 22 WpfApplication2

Am i missing some Libary or something

Currently my code is like that

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Data;
using System.Windows.Documents;
using System.Windows.Input;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using System.Windows.Navigation;
using System.Windows.Shapes;

namespace WpfApplication1
{
    /// <summary>
    /// Interaction logic for Window1.xaml
    /// </summary>
    public partial class Window1 : Window
    {
        public Window1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, RoutedEventArgs e)
        {
            HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
            htmlDoc.Load("nokia_c5_03-3578.html");
             var text = document.getElementsByClassName("nfo")[0].innerHTML;

        } 
    }

}

Upvotes: 0

Views: 2629

Answers (5)

tbicr
tbicr

Reputation: 26090

You can get elements by class name using next method which return elements where are several classes defined in one class attribute:

private HtmlNodeCollection GetElementsByClassName(HtmlDocument htmlDocument, string className)
{
    string xpath =
        String.Format(
            "//*[contains(concat(' ', normalize-space(@class), ' '), ' {0} ')]",
            className);
    return htmlDocument.DocumentNode.SelectNodes(xpath);
}

Upvotes: 0

Yogesh
Yogesh

Reputation: 14608

You are mixing C# code with javascript code.

Instead of this:

var text = document.getElementsByClassName("nfo")[0].innerHTML;

type this:

var text = htmlDoc.DocumentNode.SelectNodes("//td[@class='nfo']")[0].InnerHtml;

To keep it simple, I have refrained from checking exceptions.

Upvotes: 2

James King
James King

Reputation: 6353

do you want

var text = htmlDoc.getElementsByClassName("nfo")[0].innerHTML;

? Not familiar with HTML Agility Pack, but that would seem to make sense

Upvotes: 0

Liviu Mandras
Liviu Mandras

Reputation: 6627

You must use the htmlDoc variable to call methods in your case. By the way the HtmlDocument class does not have a method with that name. Try to see if you can find another match for your needs in this list.

As the error says, the document variable does not exits in your code.

Upvotes: 0

acme
acme

Reputation: 14856

I'm not very deep into .net but it looks like you are trying to mix JavaScript-code

var text = document.getElementsByClassName("nfo")[0].innerHTML;

with your .net code...?

Upvotes: 1

Related Questions