Reputation: 2297
I have some files that were displayed in a browse and then I used File, Save As.. to place the text in a local file. The page has some scripting and it will not display properly in a WebBrowserControl on a WinForm. The problem appears to be scripts as the control displays "script error" dialogs. I don't really need to view the file but to just retrieve a few elements by ID.
The first block of code below does load the file into a local object, but only the first 4096 bytes. (Same happens if I use a WebBrowser resident on the form.)
The second block doesn't complain but the GetElementByID fails as the desired element is beyond the first 4096.
Dim web As New WebBrowser
web.AllowWebBrowserDrop = False
web.ScriptErrorsSuppressed = True
web.Url = New Uri(sFile)
Dim doc As HtmlDocument
Dim elem As HtmlElement
doc = web.Document
elem = doc.GetElementById("userParts")
What am I doing wrong?
Is there a better approach for a VB.Net WinForm project for loading an HTML document from which I can read elements?
I just went with string functions for the simple task at hand:
Function GetInnerTextByID(html As String, elemID As String) As String
Try
Dim s As String = html.Substring(html.IndexOf("<body>"))
s = s.Substring(s.IndexOf(elemID))
s = s.Substring(s.IndexOf(">") + 1)
s = s.Substring(0, s.IndexOf("<"))
s = s.Replace(vbCr, "").Replace(vbLf, "").Trim
Return s
Catch ex As Exception
Return ""
End Try
End Function
I'd still be interested in a native VB.Net (non-ASP) approach. Or why the OP only loads 4096 bytes.
Upvotes: 1
Views: 11195
Reputation: 460288
I would use HtmlAgilityPack
instead.
You: "True - but overly complex for my simple task of extracting a few elements by ID."
It has also a document.GetElementbyId
method which is rather simple. And it has no strange issues with scripts or bytes. Just load the document from web, stream, file or from a plain string.
For example (web):
Dim document As New HtmlAgilityPack.HtmlDocument
Dim myHttpWebRequest = CType(WebRequest.Create("URL"), HttpWebRequest)
myHttpWebRequest.UserAgent = "Mozilla/5.0 (compat ble; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"
Dim streamRead = New StreamReader(CType(myHttpWebRequest.GetResponse(), HttpWebResponse).GetResponseStream)
Dim res As HttpWebResponse = CType(myHttpWebRequest.GetResponse(), HttpWebResponse)
document.Load(res.GetResponseStream(), True)
Dim node As HtmlNode = document.GetElementbyId("userParts")
or from file:
document.Load("Path")
or from string(f.e. a whole webpage in a html-file read by File.ReadAllText
):
document.LoadHtml("HTML")
Upvotes: 3