rheitzman
rheitzman

Reputation: 2297

Read HTML File in VB.Net

I have some files that were displayed in a browse and then I used File, Save As.. to place the text in a local file. The page has some scripting and it will not display properly in a WebBrowserControl on a WinForm. The problem appears to be scripts as the control displays "script error" dialogs. I don't really need to view the file but to just retrieve a few elements by ID.

The first block of code below does load the file into a local object, but only the first 4096 bytes. (Same happens if I use a WebBrowser resident on the form.)

The second block doesn't complain but the GetElementByID fails as the desired element is beyond the first 4096.

    Dim web As New WebBrowser
    web.AllowWebBrowserDrop = False
    web.ScriptErrorsSuppressed = True
    web.Url = New Uri(sFile)

    Dim doc As HtmlDocument
    Dim elem As HtmlElement
    doc = web.Document
    elem = doc.GetElementById("userParts")

What am I doing wrong?

Is there a better approach for a VB.Net WinForm project for loading an HTML document from which I can read elements?


I just went with string functions for the simple task at hand:

    Function GetInnerTextByID(html As String, elemID As String) As String
    Try
        Dim s As String = html.Substring(html.IndexOf("<body>"))
        s = s.Substring(s.IndexOf(elemID))
        s = s.Substring(s.IndexOf(">") + 1)
        s = s.Substring(0, s.IndexOf("<"))
        s = s.Replace(vbCr, "").Replace(vbLf, "").Trim
        Return s
    Catch ex As Exception
        Return ""
    End Try
End Function

I'd still be interested in a native VB.Net (non-ASP) approach. Or why the OP only loads 4096 bytes.

Upvotes: 1

Views: 11195

Answers (1)

Tim Schmelter
Tim Schmelter

Reputation: 460288

I would use HtmlAgilityPack instead.

You: "True - but overly complex for my simple task of extracting a few elements by ID."

It has also a document.GetElementbyId method which is rather simple. And it has no strange issues with scripts or bytes. Just load the document from web, stream, file or from a plain string.

For example (web):

Dim document As New HtmlAgilityPack.HtmlDocument
Dim myHttpWebRequest = CType(WebRequest.Create("URL"), HttpWebRequest)
myHttpWebRequest.UserAgent = "Mozilla/5.0 (compat ble; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"
Dim streamRead = New StreamReader(CType(myHttpWebRequest.GetResponse(), HttpWebResponse).GetResponseStream)
Dim res As HttpWebResponse = CType(myHttpWebRequest.GetResponse(), HttpWebResponse)
document.Load(res.GetResponseStream(), True)

Dim node As HtmlNode = document.GetElementbyId("userParts")

or from file:

document.Load("Path")

or from string(f.e. a whole webpage in a html-file read by File.ReadAllText):

document.LoadHtml("HTML")

Upvotes: 3

Related Questions