Reputation: 125
I am struggling with a huge Excel sheet (with 200K rows), where I need to extract from a certain column (B) list of all email addresses present in the string.
What I want to achieve:
(at)
to @
and (dot)
to .
Example of column B:
Shubhomoy Biswas <biswas_shubhomoy777(at)yahoo(dot)com>
Puneet Arora <ar.puneetarora(at)gmail(dot)com>
Anand Upadhyay <001.anand(at)gmail(dot)com>
Rajat Gupta <rajatgupta0889(at)gmail(dot)com>
Sarvesh Sonawane <sarvesh.s(at)suruninfocoresystems.
Although I want to be able to do it on Excel any other Windows-based utility suggestion would be helpful.
Upvotes: 1
Views: 1336
Reputation: 10705
This does it for 200 K rows in less than 15 seconds:
Option Explicit
Sub extractPattern()
Dim ws As Worksheet, ur As Range, rng As Range, t As Double
Dim fr As Long, fc As Long, lr As Long, lc As Long
Set ws = Application.ThisWorkbook.Worksheets("Sheet1")
Set ur = ws.UsedRange
fr = 1
fc = 1
lr = ws.Cells(ur.Row + ur.Rows.Count + 1, fc).End(xlUp).Row
lc = ws.Cells(fr, ur.Column + ur.Columns.Count + 1).End(xlToLeft).Column
Set rng = ws.Range(ws.Cells(fr, fc), ws.Cells(lr, fc))
enableXL False
t = Timer
rng.TextToColumns Destination:=ws.Cells(fr, lc + 1), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, _
Space:=True
With ws.Columns(lc + 3)
.Replace What:="(at)", Replacement:="@", LookAt:=xlPart
.Replace What:="(dot)", Replacement:=".", LookAt:=xlPart
.Replace What:="<", Replacement:=vbNullString, LookAt:=xlPart
.Replace What:=">", Replacement:=vbNullString, LookAt:=xlPart
End With
ws.Range(ws.Cells(fr, lc + 1), ws.Cells(fr, lc + 3)).EntireColumn.AutoFit
Debug.Print "Total rows: " & lr & ", Duration: " & Timer - t & " seconds"
enableXL 'Total rows: 200,000, Duration: 14.4296875 seconds
End Sub
Private Sub enableXL(Optional ByVal opt As Boolean = True)
Application.ScreenUpdating = opt
Application.EnableEvents = opt
Application.Calculation = IIf(opt, xlCalculationAutomatic, xlCalculationManual)
End Sub
It places the new data in the first unused column at the end (splits the names as well)
Upvotes: 0
Reputation: 22205
You can also do this easily a regular expression (you'll need to add a reference to Microsoft VBScript Regular Expressions):
Private Sub ExtractEmailInfo(value As String)
Dim expr As New RegExp
Dim result As Object
Dim user As String
Dim addr As String
expr.Pattern = "(.+)(<.+>)"
Set result = expr.Execute(value)
If result.Count > 0 Then
user = result(0).SubMatches(0)
addr = result(0).SubMatches(1)
'Strip the < and >
addr = Mid$(addr, 2, Len(addr) - 2)
addr = Replace$(addr, "(at)", "@")
addr = Replace$(addr, "(dot)", ".")
End If
Debug.Print user
Debug.Print addr
End Sub
Replace the Debug.Print
calls with whatever you need to do to place them in cells.
Upvotes: 0
Reputation: 11
To extract the name, try =TRIM(LEFT(B1,FIND("<",B1)-1)). user3005775's answer works for the email.
Upvotes: 1
Reputation: 306
this can be done assuming they are all in the same format and only 1 email add per cell
=SUBSTITUTE(SUBSTITUTE(MID(B1,FIND("<",B1)+1,LEN(B1)-FIND("<",B1)-1),"(at)","@"),"(dot)",".")
Upvotes: 2
Reputation: 96791
Give this a try:
Sub splitter()
Dim r As Range, v As String
For Each r In Intersect(Range("B:B"), ActiveSheet.UsedRange)
v = r.Text
If v <> "" Then
ary = Split(v, " <")
r.Offset(0, 1).Value = ary(0)
r.Offset(0, 2).Value = Replace(Replace(Replace(ary(1), ">", ""), "(at)", "@"), "(dot)", ".")
End If
Next r
End Sub
This sub uses columns C and D for the output. Modify the code to suite your needs.
Upvotes: 1