Anmol Dubey
Anmol Dubey

Reputation: 125

Extract pattern from column

I am struggling with a huge Excel sheet (with 200K rows), where I need to extract from a certain column (B) list of all email addresses present in the string.

What I want to achieve:

  1. Extract the email from string
  2. convert (at) to @ and (dot) to .
  3. Save name and email in separate columns

Example of column B:

Shubhomoy Biswas <biswas_shubhomoy777(at)yahoo(dot)com>
Puneet Arora <ar.puneetarora(at)gmail(dot)com>
Anand Upadhyay <001.anand(at)gmail(dot)com>
Rajat Gupta <rajatgupta0889(at)gmail(dot)com>
Sarvesh Sonawane <sarvesh.s(at)suruninfocoresystems.

Although I want to be able to do it on Excel any other Windows-based utility suggestion would be helpful.

Upvotes: 1

Views: 1336

Answers (5)

paul bica
paul bica

Reputation: 10705

This does it for 200 K rows in less than 15 seconds:

Option Explicit

Sub extractPattern()
    Dim ws As Worksheet, ur As Range, rng As Range, t As Double
    Dim fr As Long, fc As Long, lr As Long, lc As Long

    Set ws = Application.ThisWorkbook.Worksheets("Sheet1")
    Set ur = ws.UsedRange
    fr = 1
    fc = 1
    lr = ws.Cells(ur.Row + ur.Rows.Count + 1, fc).End(xlUp).Row
    lc = ws.Cells(fr, ur.Column + ur.Columns.Count + 1).End(xlToLeft).Column

    Set rng = ws.Range(ws.Cells(fr, fc), ws.Cells(lr, fc))

    enableXL False
    t = Timer
    rng.TextToColumns Destination:=ws.Cells(fr, lc + 1), DataType:=xlDelimited, _
                      TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, _
                      Space:=True
    With ws.Columns(lc + 3)
        .Replace What:="(at)", Replacement:="@", LookAt:=xlPart
        .Replace What:="(dot)", Replacement:=".", LookAt:=xlPart
        .Replace What:="<", Replacement:=vbNullString, LookAt:=xlPart
        .Replace What:=">", Replacement:=vbNullString, LookAt:=xlPart
    End With
    ws.Range(ws.Cells(fr, lc + 1), ws.Cells(fr, lc + 3)).EntireColumn.AutoFit
    Debug.Print "Total rows: " & lr & ", Duration: " & Timer - t & " seconds"
    enableXL    'Total rows: 200,000, Duration: 14.4296875 seconds
End Sub

Private Sub enableXL(Optional ByVal opt As Boolean = True)
    Application.ScreenUpdating = opt
    Application.EnableEvents = opt
    Application.Calculation = IIf(opt, xlCalculationAutomatic, xlCalculationManual)
End Sub

It places the new data in the first unused column at the end (splits the names as well)

enter image description here

Upvotes: 0

Comintern
Comintern

Reputation: 22205

You can also do this easily a regular expression (you'll need to add a reference to Microsoft VBScript Regular Expressions):

Private Sub ExtractEmailInfo(value As String)

    Dim expr As New RegExp
    Dim result As Object
    Dim user As String
    Dim addr As String

    expr.Pattern = "(.+)(<.+>)"
    Set result = expr.Execute(value)
    If result.Count > 0 Then
        user = result(0).SubMatches(0)
        addr = result(0).SubMatches(1)
        'Strip the < and >
        addr = Mid$(addr, 2, Len(addr) - 2)
        addr = Replace$(addr, "(at)", "@")
        addr = Replace$(addr, "(dot)", ".")
    End If

    Debug.Print user
    Debug.Print addr

End Sub

Replace the Debug.Print calls with whatever you need to do to place them in cells.

Upvotes: 0

ara
ara

Reputation: 11

To extract the name, try =TRIM(LEFT(B1,FIND("<",B1)-1)). user3005775's answer works for the email.

Upvotes: 1

user3005775
user3005775

Reputation: 306

this can be done assuming they are all in the same format and only 1 email add per cell

=SUBSTITUTE(SUBSTITUTE(MID(B1,FIND("<",B1)+1,LEN(B1)-FIND("<",B1)-1),"(at)","@"),"(dot)",".")

Upvotes: 2

Gary&#39;s Student
Gary&#39;s Student

Reputation: 96791

Give this a try:

Sub splitter()
   Dim r As Range, v As String

   For Each r In Intersect(Range("B:B"), ActiveSheet.UsedRange)
      v = r.Text
      If v <> "" Then
         ary = Split(v, " <")
         r.Offset(0, 1).Value = ary(0)
         r.Offset(0, 2).Value = Replace(Replace(Replace(ary(1), ">", ""), "(at)", "@"), "(dot)", ".")
      End If
   Next r
End Sub

enter image description here

This sub uses columns C and D for the output. Modify the code to suite your needs.

Upvotes: 1

Related Questions