Tom
Tom

Reputation: 143

VB.Net Webview2 How can I get html source code?

I sucessfully display a web site on WebView2 in my VB.net (Visual Studio 2017) project but can not get html souce code. Please advise me how to get html code.

My code:

Private Sub testbtn_Click(sender As Object, e As EventArgs) Handles testbtn.Click
        WebView2.CoreWebView2.Navigate("https://www.microsoft.com/")
End Sub

Private Sub WebView2_NavigationCompleted(sender As Object, e As CoreWebView2NavigationCompletedEventArgs) Handles WebView2.NavigationCompleted
        Dim html As String = ?????
End Sub

Thank you indeed for your advise in advance.

Upvotes: 12

Views: 25878

Answers (5)

smitty smitty
smitty smitty

Reputation: 1

Components

Form1 As Form
---------------
Button1 As Button
---------------
WV1 As WebView2
---------------
TextBox1 As TextBox
------------------------

Imports Microsoft.Web.WebView2.Core

Public Class Form1
    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        WV1.Source = New Uri("https://www.google.com/")
    End Sub

    Private Sub WV1_NavigationCompleted(sender As Object, e As CoreWebView2NavigationCompletedEventArgs) Handles WV1.NavigationCompleted
        Dim task = GetPage2InfoAsync()
    End Sub

    Private Async Function GetPage2InfoAsync() As Task
        Dim DateStr As String
        DateStr = Await WV1.ExecuteScriptAsync("document.documentElement.outerHTML")
        TextBox1.MaxLength = DateStr.Length + 1000
        TextBox1.Text = DateStr
    End Function
End Class

One thing I did find out...TextBox is default to 32k length...A lot of page source is like 2 to 3 megs...So I set my TextBox max length to 5000000

I added a line to cure that problem Under

DateStr = Await WV1.ExecuteScriptAsync("document.documentElement.outerHTML")

I added

TextBox1.MaxLength = DateStr.Length + 1000

That sets the TextBox Length to the Returned Length Plus 1000 Characters.

Upvotes: 0

Poul Bak
Poul Bak

Reputation: 10940

The accepted answer is on the right track. However, it's missing on important thing:

The returned string is NOT HTMLEncoded, it's JSON!

So to do it right, you need to deserialize the JSON, which is just as simple:

Dim html As String
html = Await WebView2.ExecuteScriptAsync("document.documentElement.outerHTML;")
html = Await JsonSerializer.DeserializeAsync(Of String)(html);

Upvotes: 2

Ken Smith
Ken Smith

Reputation: 155

I must credit @Xaviorq8; his answer was needed to solve my problem. I was successfully using .NET WebBrowser and Html Agility Pack but I wanted to replace WebBrowser with .NET WebView2.

Snippet (working code with WebBrowser):
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.Load(webBrowser1.DocumentStream);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[@id=\"apptAndReportsTbl\"]");
Snippet (failing code with WebView2):
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
string html = await webView21.ExecuteScriptAsync("document.documentElement.outerHTML");
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.LoadHtml(html);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[@id=\"apptAndReportsTbl\"]");

Success withWebView2 and Html Agility Pack

using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
string html = await webView21.ExecuteScriptAsync("document.documentElement.outerHTML");
// thanks to @Xaviorq8 answer (next 3 lines)
html = Regex.Unescape(html);
html = html.Remove(0, 1);
html = html.Remove(html.Length - 1, 1);
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.LoadHtml(html);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[@id=\"apptAndReportsTbl\"]");

Upvotes: 1

JohnyL
JohnyL

Reputation: 7162

Adding to @Xaviorq8 answer, you can use Span to get rid of generating new strings with Remove:

html = Regex.Unescape(html)
html = html.AsSpan()[1..^1].ToString();

Upvotes: 3

Xaviorq8
Xaviorq8

Reputation: 366

I've only just started messing with the WebView2 earlier today as well, and was just looking for this same thing. I did manage to scrape together this solution:

Dim html As String
html = Await WebView2.ExecuteScriptAsync("document.documentElement.outerHTML;")

' The Html comes back with unicode character codes, other escaped characters, and
' wrapped in double quotes, so I'm using this code to clean it up for what I'm doing.
html = Regex.Unescape(html)
html = html.Remove(0, 1)
html = html.Remove(html.Length - 1, 1)

Converted my code from C# to VB on the fly, so hopefully didn't miss any syntax errors.

Upvotes: 35

Related Questions