Best Newspk
Best Newspk

Reputation: 37

Phone Number Extracting using RegEx And HtmlAgilityPack In Vb.net

i've written this whole code to extract cell numbers from a website but the thing is it is extracting numbers perfectly but very slowly it's also hanging my Form while Extracting, pleaase help me to make it run faster.

and run more efficiently.

Imports HtmlAgilityPack
Imports System.Text.RegularExpressions
Public Class Extractor

Shared doc As New HtmlAgilityPack.HtmlDocument()

Public Shared Function ScrapLinks(TextBox1 As TextBox, ListBox1 As ListBox, lbllinks As Label)
    Dim hw As New HtmlWeb()
    Try
        doc = hw.Load(TextBox1.Text)
        doc.LoadHtml(doc.DocumentNode.SelectSingleNode("//*[@id='ad_list']").InnerHtml())

        For Each link As HtmlNode In doc.DocumentNode.SelectNodes("//a[@href]")

            Dim hrefValue As String = link.GetAttributeValue("href", String.Empty)

            If hrefValue.Contains("/detail/") Then
                If Not ListBox1.Items.Contains(hrefValue) Then
                    ListBox1.Items.Add(hrefValue)
                End If
            End If
        Next

    Catch ex As Exception
        MsgBox("Error " + ex.Message)

    End Try
    Return Nothing

End Function

Public Shared Function Scrapnums(lstbox As ListBox,lstnum As ListBox)
    Try

        Dim hw As New HtmlWeb()
        doc = hw.Load(lstbox.SelectedItem)

        Dim data = doc.DocumentNode.SelectSingleNode("//*[@class='det_ad f_left']").InnerText

        Dim m As Match = Regex.Match(data, "(\+92|0092)-?\d{3}-?\d{7}|\d{11}|\d{4}-\d{7}")


        If Not lstnum.Items.Contains(m.Value) Then

            lstnum.Items.Add(m.Value)

        End If

    Catch ex As Exception


    End Try
    Return Nothing

End Function

End Class

Upvotes: 2

Views: 507

Answers (1)

pushpraj
pushpraj

Reputation: 13679

here is a regex to parse the phone numbers

Regex

((?(?=\+92|0092)(?:\+|00)92\d?(?:-\d{3}-?\d{7}|-?\d{9}|\d{10})|(?:\d{11}|\d{3}\s\d{3}\s\d{5,6}|\d{4}-\d{7})))

Sample numbers

+92-3113143446
+923-113143446
032 124 26003
923 072 776037
03154031162
+923218923116
0307-2796038
+92-343-2842120

Result

  • MATCH 1
    1. [0-14] +92-3113143446
  • MATCH 2
    1. [15-29] +923-113143446
  • MATCH 3
    1. [30-43] 032 124 26003
  • MATCH 4
    1. [44-58] 923 072 776037
  • MATCH 5
    1. [59-70] 03154031162
  • MATCH 6
    1. [71-84] +923218923116
  • MATCH 7
    1. [85-97] 0307-2796038
  • MATCH 8
    1. [98-113] +92-343-2842120

Demo

Online Demo

above regex is based on assumptions, it may match more patterns then listed above. so it may need to be refined as needed.

Upvotes: 1

Related Questions