Sam Hearn
Sam Hearn

Reputation: 65

RegEx to extract email

I need to extract only the email from a spreadsheet in Excel. I've found some example VBA code here at this StackOverflow link, courtesy of Portland Runner.

I created an Excel module and it seems to be working fine, except it only returns the first uppercase character of the address into the cell and it's ignoring the email.

For example:

Text                                    | Result
----------------------------------------|------------------------------
My email address is [email protected]   | My email address is  
Yes  [email protected]                  | Yes  A

Below is the code I'm using:

Function simpleCellRegex(Myrange As Range) As String
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim strReplace As String
    Dim strOutput As String


    strPattern = "[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"

    If strPattern <> "" Then
        strInput = Myrange.Value
        strReplace = ""

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        If regEx.test(strInput) Then
            simpleCellRegex = regEx.Replace(strInput, strReplace)
        Else
            simpleCellRegex = "Not matched"
        End If
    End If
End Function

I do not have enough experience with VBA to really diagnose what might be happening here, hopefully someone will be able to spot what I'm doing wrong.

Working Code

Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String


strPattern = "[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-zA-Z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"

If strPattern <> "" Then
    strInput = Myrange.Value
    strReplace = ""

    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = strPattern
    End With

    If regEx.Test(strInput) Then
        Set matches = regEx.Execute(strInput)
        simpleCellRegex = matches(0).Value
    Else
        simpleCellRegex = "Not matched"
    End If
End If
End Function

Upvotes: 3

Views: 5990

Answers (4)

Angu  Ransom
Angu Ransom

Reputation: 1

The easiest way to do this is by installing the software called KUtool. After installing, highlight the content you want to extract emails==>Click ku tools at the top middle==>click on text==>extract emails. You can also use the following code.(ALT+F1==>INSERT MODULE)

Function ExtractEmailFun(extractStr As String) As String
'Update 20130829
Dim CharList As String
On Error Resume Next
CheckStr = "[A-Za-z0-9._-]"
OutStr = ""
Index = 1
Do While True
    Index1 = VBA.InStr(Index, extractStr, "@")
    getStr = ""
    If Index1 > 0 Then
        For p = Index1 - 1 To 1 Step -1
            If Mid(extractStr, p, 1) Like CheckStr Then
                getStr = Mid(extractStr, p, 1) & getStr
            Else
                Exit For
            End If
        Next
        getStr = getStr & "@"
        For p = Index1 + 1 To Len(extractStr)
            If Mid(extractStr, p, 1) Like CheckStr Then
                getStr = getStr & Mid(extractStr, p, 1)
            Else
                Exit For
            End If
        Next
        Index = Index1 + 1
        If OutStr = "" Then
            OutStr = getStr
        Else
            OutStr = OutStr & Chr(10) & getStr
        End If
    Else
        Exit Do
    End Ifenter code here
Loop
ExtractEmailFun = OutStr
End Function

You can also go the code way Open excell, click on ALT +F1, Click on insert Module and paste this code

Click save and enter the formula(Column=ExtractEmailFun(A1)) in a blank cell. press enter and your emails will be extracted. Hope this will help

Upvotes: 0

Marcin Wesel
Marcin Wesel

Reputation: 104

When You return strInput You just get the same string as the input. You need to return Value that has been found using RegExp.

Try

Set matches = regEx.Execute(strInput)
simpleCellRegex = matches(1).Value

Instead of

simpleCellRegex = regEx.Replace(strInput, strReplace)

Upvotes: 1

Pablo Ontiveros
Pablo Ontiveros

Reputation: 11

You can change the line

 simpleCellRegex = regEx.Replace(strInput, strReplace)

To

 simpleCellRegex = strInput

Because you are not making any replacement

Upvotes: 1

izzymo
izzymo

Reputation: 936

Try the below pattern

strPattern  = "^([a-zA-Z0-9_\-\.]+)@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$"

Upvotes: -1

Related Questions