Reputation: 2688
So, I've got a wpf form, which goes out to a site, parses the html, and returns a strongly typed list of the 'href' values. (yes, this is for my own website)
I am utilizing a backgroundworker to release the hangup of the UI, and render a running progress bar.
While it works great with just the first page of the site, if I decide to Recurse the site, the progressbar hangs, while the recursion is happenning, then once the recursion is through, the progressbar comes back to life.
Can you tell me what I am doing wrong here? And possible direct me to the proper usage of said backgroundworker with the progressbar... Basically, the progressbar should run while the task is being performed, but I assume based on the code that this really isnt the case.
Here's the code-behind for the window that this is being done in:
Imports System.Threading.Tasks
Imports System.Threading
Class MainWindow
Private _previousCursor As Cursor = Mouse.OverrideCursor
Private _Spider As New Spider.SpiderIt
Private _Worker As New ComponentModel.BackgroundWorker
Private _RunCount As Integer = 0
Private Sub MainWindow_Loaded(sender As Object, e As System.Windows.RoutedEventArgs) Handles Me.Loaded
Me.workProgress.Visibility = Windows.Visibility.Hidden
_Worker.WorkerReportsProgress = True
_Worker.WorkerSupportsCancellation = True
AddHandler _Worker.DoWork, New System.ComponentModel.DoWorkEventHandler(AddressOf Spider)
AddHandler _Worker.ProgressChanged, New System.ComponentModel.ProgressChangedEventHandler(AddressOf worker_ProgressChanged)
AddHandler _Worker.RunWorkerCompleted, New System.ComponentModel.RunWorkerCompletedEventHandler(AddressOf worker_RunWorkerCompleted)
End Sub
Private Sub SiteParseKeyDown(sender As System.Object, e As System.Windows.Input.KeyEventArgs)
If (e.Key = Key.Return) Then
Me.btnParseAll.IsEnabled = False
Me.btnParseSelected.IsEnabled = False
Me.SiteParse.IsEnabled = False
Mouse.OverrideCursor = Cursors.Wait
Me.workProgress.Visibility = Windows.Visibility.Visible
_Worker.RunWorkerAsync(New Typing() With {.Url = SiteParse.Text, .Recurse = Recurse.IsChecked})
End If
End Sub
Private Sub btnParseAll_Click(sender As Object, e As System.Windows.RoutedEventArgs) Handles btnParseAll.Click
Me.btnParseAll.IsEnabled = False
Me.btnParseSelected.IsEnabled = False
Me.SiteParse.IsEnabled = False
Dim _TL As New List(Of DGTyping)
Using New WaitCursor
For Each Item In DG_SiteLinks.Items
_TL.Add(New DGTyping() With {
.SiteUrl = Item.SiteUrl,
.SiteTitle = Item.SiteTitle
End Using
Dim _T As New ParseLinks(Me, _TL)
End Sub
Private Sub btnParseSelected_Click(sender As Object, e As System.Windows.RoutedEventArgs) Handles btnParseSelected.Click
Me.btnParseAll.IsEnabled = False
Me.btnParseSelected.IsEnabled = False
Me.SiteParse.IsEnabled = False
Dim _TL As New List(Of DGTyping)
Using New WaitCursor
For Each Item In DG_SiteLinks.SelectedItems
_TL.Add(New DGTyping() With {
.SiteUrl = Item.SiteUrl,
.SiteTitle = Item.SiteTitle
End Using
Dim _T As New ParseLinks(Me, _TL)
End Sub
#Region "Get Site Links"
Private Sub Spider(sender As Object, e As System.ComponentModel.DoWorkEventArgs)
'Do the work here, but need to get the value of SiteParse first
With _Spider
.UrlToParse = DirectCast(e.Argument.Url, String)
.ShouldRecurse = DirectCast(e.Argument.Recurse, Boolean)
.RecurseLevels = 20
End With
End Sub
Private Sub worker_ProgressChanged(sender As Object, e As System.ComponentModel.ProgressChangedEventArgs)
workProgress.Value = e.ProgressPercentage
End Sub
Private Sub worker_RunWorkerCompleted(sender As Object, e As System.ComponentModel.RunWorkerCompletedEventArgs)
Dim _IL As List(Of Spider.Typing.InternalLinks)
_IL = _Spider.InternalLinks()
Dim _TL As New List(Of DGTyping)
For Each item In _IL
_TL.Add(New DGTyping() With {
.SiteUrl = item.Url,
.SiteTitle = If(item.Title.Length > 0, item.Title, item.Content)
Me.DG_SiteLinks.ItemsSource = _TL
End Sub
Private Sub BrowseSite(sender As Object, e As RoutedEventArgs)
Dim _URL As String = DirectCast(sender, TextBlock).Text
Dim _T As New Browser(_URL)
End Sub
Private Sub Window_Closing(sender As Object, e As System.ComponentModel.CancelEventArgs)
If _Worker IsNot Nothing Then
If _Worker.IsBusy Then
End If
End If
End Sub
Private Sub EndSync()
End Sub
Private Sub EndRest()
workProgress.Value = 0
workProgress.Visibility = Windows.Visibility.Hidden
Me.btnParseAll.IsEnabled = True
Me.btnParseSelected.IsEnabled = True
Me.SiteParse.IsEnabled = True
Mouse.OverrideCursor = _previousCursor
End Sub
Partial Public Class Typing
Public Property Url As String
Public Property Recurse As Boolean
End Class
Partial Public Class DGTyping
Public Property SiteUrl As String
Public Property SiteTitle As String
End Class
#End Region
End Class
.SpiderIt() goes out the site specified, grabs the html as an HDocument (SuperstarCoders LinqToHtml), parses it for internal links, and throws them into a strongly typed list. This is done in a seperate class assembly, and performs perfectly.
SpiderIt method and containing class:
Imports Superstar.Html.Linq
Imports System.Threading.Tasks
Public Class SpiderIt
Implements IDisposable
#Region "Public Properties"
''' <summary>
''' Specify the initial URL to parse
''' </summary>
''' <value></value>
''' <returns></returns>
''' <remarks></remarks>
Public Property UrlToParse As String
''' <summary>
''' Should this recurse the internal links of the site
''' </summary>
''' <value></value>
''' <returns></returns>
''' <remarks></remarks>
Public Property ShouldRecurse As Boolean = False
''' <summary>
''' Specify the number of levels to recurse
''' </summary>
''' <value></value>
''' <returns></returns>
''' <remarks></remarks>
Public Property RecurseLevels As Long = 0
''' <summary>
''' Returns a message from the SpiderIt method
''' </summary>
''' <value></value>
''' <returns></returns>
''' <remarks></remarks>
Public ReadOnly Property Message() As String
Return _Msg
End Get
End Property
''' <summary>
''' Returns a strongly typed list of internal links
''' </summary>
''' <value></value>
''' <returns></returns>
''' <remarks></remarks>
Public ReadOnly Property InternalLinks() As List(Of Typing.InternalLinks)
Return _InternalLinkList
End Get
End Property
''' <summary>
''' Returns a strongly typed list of external links
''' </summary>
''' <value></value>
''' <returns></returns>
''' <remarks></remarks>
Public ReadOnly Property ExternalLinks() As List(Of Typing.ExternalLinks)
Return _ExternalLinkList
End Get
End Property
#End Region
#Region "Internal Properties"
Private disposedValue As Boolean
Private _Msg As String
Private _Ctr As Long = 0
Private _InternalLinkList As New List(Of Typing.InternalLinks)
Private _ExternalLinkList As New List(Of Typing.ExternalLinks)
Private _DLer As New Downloader
Private _RCt As Long = 0
#End Region
#Region "Public Methods"
''' <summary>
''' Parse with the specified values
''' </summary>
''' <returns>Boolean</returns>
''' <remarks>Returns true or false, based on if it has completed, as well as a message
''' Spits out 2 strongly typed lists. Both internal and external URLs
''' </remarks>
Public Function SpiderIt(ByVal _Worker) As Boolean
For i As Integer = 1 To 99
Dim _Doc As HDocument = _DLer.DownloadHDoc(UrlToParse)
With _Doc
If _Doc Is Nothing Then
_Msg = "There is no document to parse."
Return False
Dim _AL = .Descendants("a")
'Parse the internal links
_Msg = "Internal Link List Built"
Return True
Catch ex As Exception
_Msg = ex.Message
Return False
End Try
End If
End With
End Function
#End Region
#Region "Internal Methods"
#Region "Spider Helpers"
Private Sub ParseLinks(ByVal _AL As IEnumerable(Of HElement))
Dim _Link As String, _D As HDocument
For i As Long = 0 To _AL.Count - 1
If _AL(i).Attribute("href") IsNot Nothing AndAlso Not (_AL(i).Attribute("href").Value.Contains("//") OrElse
_AL(i).Attribute("href").Value.Contains("http://") OrElse
_AL(i).Attribute("href").Value.Contains("https://") OrElse
_AL(i).Attribute("href").Value.Contains("ftp://") OrElse
_AL(i).Attribute("href").Value.Contains("mailto:") OrElse
_AL(i).Attribute("href").Value.Contains("#")) Then
_Link = UrlToParse & "/" & _AL(i).Attribute("href").Value
If Not (_InternalLinkList.Any(Function(x) x.Url = _Link.Replace("//", "/").Replace("http:/", "http://").Replace("https:/", "https://"))) Then
AddInternalLinks(_Link.Replace("//", "/").Replace("http:/", "http://").Replace("https:/", "https://"),
If(_AL(i).Attribute("target") Is Nothing,
If(_AL(i).Attribute("title") Is Nothing,
If ShouldRecurse Then
_RCt += 1
If _RCt <= RecurseLevels Then
_D = _DLer.DownloadHDoc(_Link)
End If
End If
End If
_Link = _AL(i).Attribute("href").Value
If Not (_ExternalLinkList.Any(Function(x) x.Url = _Link)) Then
If(_AL(i).Attribute("target") Is Nothing,
If(_AL(i).Attribute("title") Is Nothing,
End If
End If
Catch ex As Exception
_Msg += ex.StackTrace
End Try
End Sub
Private Sub AddExternalLinks(ByVal _Link As String, ByVal _Target As String, ByVal _Content As String, ByVal _Title As String)
_ExternalLinkList.Add(New Typing.ExternalLinks With {
.Url = _Link,
.Content = _Content,
.Target = _Target,
.Title = _Title
Catch ex As Exception
_Msg += ex.StackTrace
End Try
End Sub
Private Sub AddInternalLinks(ByVal _Link As String, ByVal _Target As String, ByVal _Content As String, ByVal _Title As String)
_InternalLinkList.Add(New Typing.InternalLinks With {
.Url = _Link,
.Content = _Content,
.Target = _Target,
.Title = _Title
Catch ex As Exception
_Msg += ex.StackTrace
End Try
End Sub
#End Region
#Region "IDisposable Support"
Protected Overridable Sub Dispose(disposing As Boolean)
If Not Me.disposedValue Then
If disposing Then
End If
_Msg = String.Empty
End If
Me.disposedValue = True
End Sub
Public Sub Dispose() Implements IDisposable.Dispose
End Sub
#End Region
#End Region
End Class
Upvotes: 2
Views: 635
Reputation: 698
I am posting this as an answer 'cause as a comment the text was too long.
Maybe I am missing something (reading the code in this format is a pain).
You are counting from 1 to 99 and report that progress every 50 miliseconds. In between nothing seems to be happening, I mean work load that would add some real delays. Then you report 100% and only then it seems to be actually loading the document and parsing which takes a while I guess.
Shouldn't you throw a ReportProgress() somewhere inside the ParseLinks() method. Of course you'll have to be able to calculate how many nodes you'll parse so you can report the progress at a pace which will coincide with 100% progress when the work is done.
Write another recursive method that only calculates the number of nodes ahead of time (that should be quick) and then armed with that number you will know the value to pass to ReportProgress() (that again you should call inside ParseLinks()) so you'll have a steady progress up to 100%. (and obviously you'll have to pass a reference to the BackgroundWorker to ParseLinks() to)
It may be difficult but nobody said it's going to be easy :D.
Upvotes: 1