Reputation: 143
I sucessfully display a web site on WebView2 in my VB.net (Visual Studio 2017) project but can not get html souce code. Please advise me how to get html code.
My code:
Private Sub testbtn_Click(sender As Object, e As EventArgs) Handles testbtn.Click
WebView2.CoreWebView2.Navigate("https://www.microsoft.com/")
End Sub
Private Sub WebView2_NavigationCompleted(sender As Object, e As CoreWebView2NavigationCompletedEventArgs) Handles WebView2.NavigationCompleted
Dim html As String = ?????
End Sub
Thank you indeed for your advise in advance.
Upvotes: 12
Views: 25878
Reputation: 1
Form1 As Form
---------------
Button1 As Button
---------------
WV1 As WebView2
---------------
TextBox1 As TextBox
------------------------
Imports Microsoft.Web.WebView2.Core
Public Class Form1
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
WV1.Source = New Uri("https://www.google.com/")
End Sub
Private Sub WV1_NavigationCompleted(sender As Object, e As CoreWebView2NavigationCompletedEventArgs) Handles WV1.NavigationCompleted
Dim task = GetPage2InfoAsync()
End Sub
Private Async Function GetPage2InfoAsync() As Task
Dim DateStr As String
DateStr = Await WV1.ExecuteScriptAsync("document.documentElement.outerHTML")
TextBox1.MaxLength = DateStr.Length + 1000
TextBox1.Text = DateStr
End Function
End Class
One thing I did find out...TextBox is default to 32k length...A lot of page source is like 2 to 3 megs...So I set my TextBox max length to 5000000
I added a line to cure that problem Under
DateStr = Await WV1.ExecuteScriptAsync("document.documentElement.outerHTML")
I added
TextBox1.MaxLength = DateStr.Length + 1000
That sets the TextBox Length to the Returned Length Plus 1000 Characters.
Upvotes: 0
Reputation: 10940
The accepted answer is on the right track. However, it's missing on important thing:
The returned string is NOT HTMLEncoded
, it's JSON
!
So to do it right, you need to deserialize the JSON
, which is just as simple:
Dim html As String
html = Await WebView2.ExecuteScriptAsync("document.documentElement.outerHTML;")
html = Await JsonSerializer.DeserializeAsync(Of String)(html);
Upvotes: 2
Reputation: 155
I must credit @Xaviorq8; his answer was needed to solve my problem. I was successfully using .NET WebBrowser and Html Agility Pack but I wanted to replace WebBrowser with .NET WebView2.
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.Load(webBrowser1.DocumentStream);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[@id=\"apptAndReportsTbl\"]");
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
string html = await webView21.ExecuteScriptAsync("document.documentElement.outerHTML");
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.LoadHtml(html);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[@id=\"apptAndReportsTbl\"]");
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
string html = await webView21.ExecuteScriptAsync("document.documentElement.outerHTML");
// thanks to @Xaviorq8 answer (next 3 lines)
html = Regex.Unescape(html);
html = html.Remove(0, 1);
html = html.Remove(html.Length - 1, 1);
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.LoadHtml(html);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[@id=\"apptAndReportsTbl\"]");
Upvotes: 1
Reputation: 7162
Adding to @Xaviorq8 answer, you can use Span
to get rid of generating new strings with Remove
:
html = Regex.Unescape(html)
html = html.AsSpan()[1..^1].ToString();
Upvotes: 3
Reputation: 366
I've only just started messing with the WebView2 earlier today as well, and was just looking for this same thing. I did manage to scrape together this solution:
Dim html As String
html = Await WebView2.ExecuteScriptAsync("document.documentElement.outerHTML;")
' The Html comes back with unicode character codes, other escaped characters, and
' wrapped in double quotes, so I'm using this code to clean it up for what I'm doing.
html = Regex.Unescape(html)
html = html.Remove(0, 1)
html = html.Remove(html.Length - 1, 1)
Converted my code from C# to VB on the fly, so hopefully didn't miss any syntax errors.
Upvotes: 35