Reputation: 37
i've written this whole code to extract cell numbers from a website but the thing is it is extracting numbers perfectly but very slowly it's also hanging my Form while Extracting, pleaase help me to make it run faster.
and run more efficiently.
Imports HtmlAgilityPack
Imports System.Text.RegularExpressions
Public Class Extractor
Shared doc As New HtmlAgilityPack.HtmlDocument()
Public Shared Function ScrapLinks(TextBox1 As TextBox, ListBox1 As ListBox, lbllinks As Label)
Dim hw As New HtmlWeb()
Try
doc = hw.Load(TextBox1.Text)
doc.LoadHtml(doc.DocumentNode.SelectSingleNode("//*[@id='ad_list']").InnerHtml())
For Each link As HtmlNode In doc.DocumentNode.SelectNodes("//a[@href]")
Dim hrefValue As String = link.GetAttributeValue("href", String.Empty)
If hrefValue.Contains("/detail/") Then
If Not ListBox1.Items.Contains(hrefValue) Then
ListBox1.Items.Add(hrefValue)
End If
End If
Next
Catch ex As Exception
MsgBox("Error " + ex.Message)
End Try
Return Nothing
End Function
Public Shared Function Scrapnums(lstbox As ListBox,lstnum As ListBox)
Try
Dim hw As New HtmlWeb()
doc = hw.Load(lstbox.SelectedItem)
Dim data = doc.DocumentNode.SelectSingleNode("//*[@class='det_ad f_left']").InnerText
Dim m As Match = Regex.Match(data, "(\+92|0092)-?\d{3}-?\d{7}|\d{11}|\d{4}-\d{7}")
If Not lstnum.Items.Contains(m.Value) Then
lstnum.Items.Add(m.Value)
End If
Catch ex As Exception
End Try
Return Nothing
End Function
End Class
Upvotes: 2
Views: 507
Reputation: 13679
here is a regex to parse the phone numbers
Regex
((?(?=\+92|0092)(?:\+|00)92\d?(?:-\d{3}-?\d{7}|-?\d{9}|\d{10})|(?:\d{11}|\d{3}\s\d{3}\s\d{5,6}|\d{4}-\d{7})))
Sample numbers
+92-3113143446
+923-113143446
032 124 26003
923 072 776037
03154031162
+923218923116
0307-2796038
+92-343-2842120
Result
+92-3113143446
+923-113143446
032 124 26003
923 072 776037
03154031162
+923218923116
0307-2796038
+92-343-2842120
Demo
above regex is based on assumptions, it may match more patterns then listed above. so it may need to be refined as needed.
Upvotes: 1