user2803146
user2803146

Reputation: 137

Editing Regex Code

I am hoping someone can help me with what I think is a regex problem.

I have a program which takes a piece of html code and extracts phone numbers from it and separates them by semi colons. What I would like to do is change this so that it extracts anything between two specific text strings with backslashes in between. For example

stringone/******/stringtwo
stringone/876876876876876/stringtwo
stringone/abcdefghijklmnopqrstuvwxyz/stringtwo

Before and after the total string there may or may not be spaces, letters, numbers or special characters.

I have really tried with regex but I can't get this figured out. I assume (and only by instinct) that the line that needs changing is this one:

    .Pattern = "(\+([\d \(\)-]+){10,15})|((\d( |-))?\(?\d{2,4}\)?( |-)\d{3,4}( |-)\d{3,4})|(\d{3,4}( |-)\d{7})"

But the entire code is as follows:

Function Main ( strText )

    dim strResult

    strResult = Extract_Phone_Numbers ( strText )

    Main = strResult

End Function

' This function extracts phone numbers from a specific string using pattern matching (a regular expression).

Function Extract_Phone_Numbers ( strText )

    dim strResult

    Set RegularExpressionObject = New RegExp

    With RegularExpressionObject
    .Pattern = "(\+([\d \(\)-]+){10,15})|((\d( |-))?\(?\d{2,4}\)?( |-)\d{3,4}( |-)\d{3,4})|(\d{3,4}( |-)\d{7})"
    .IgnoreCase = True
    .Global = True
    End With

    Set objMatches = RegularExpressionObject.Execute( strText )

    For Each objMatch in objMatches
        If ( InStr ( strResult, objMatch.value ) = 0  )  Then
            If ( Len ( strResult ) > 0  )  Then
                strResult = strResult + "; "
            End If      
            strResult = strResult + objMatch.value
        End If      
    Next

    Set RegularExpressionObject = nothing

    strResult = Trim ( strResult )

    Extract_Phone_Numbers = strResult

End Function

Can anyone help me to get this changed?

Upvotes: 2

Views: 72

Answers (1)

zx81
zx81

Reputation: 41838

  1. In general, the pattern for matching your pattern is stringone/[^/]*/stringtwo, including the slashes
  2. To match inside, but not including the slashes, there are several ways. If your flavor supports lookarounds, go with this: (?<=stringone/)[^/]*(?=/stringtwo)
  3. VBS doesn't support lookbehind, so we need to match the whole string, capturing the wanted part to Group 1: stringone/([^/]*)/stringtwo

On the demo, look at the Group 1 captures in the right pane. Note that in this regex tester the slashes had to be escaped.

Explanation

stringone/ matches a literal stringone/, then the negated character class [^/] matches one character that is not a /, and the * quantifier repeats that 0 or more times, then we match the final /stringtwo.

Upvotes: 1

Related Questions