Reputation: 11
I was wondering if anyone could show me how to extrapolate 'http://www.nbc.com/xyz' and 'I love this show' from the following string in Excel - VB.
Thanks
<a href="http://www.nbc.com/xyz" >I love this show</a><IMG border=0 width=1 height=1 src="http://ad.linksynergy.com/fs-bin/show?id=Loe5O5QVFig&bids=261463.100016851&type=3&subid=0" >
Upvotes: 1
Views: 268
Reputation: 166850
Sub Tester()
'### add a reference to "Microsoft HTML Object Library" ###
Dim odoc As New MSHTML.HTMLDocument
Dim el As Object
Dim txt As String
txt = "<a href=""http://www.nbc.com/xyz"" >I love this show</a>" & _
"<IMG border=0 width=1 height=1 " & _
"src=""http://ad.linksynergy.com/fs-bin/show?" & _
"id=Loe5O5QVFig&bids=261463.100016851&type=3&subid=0"" >"
odoc.body.innerHTML = txt
Set el = odoc.getElementsByTagName("a")(0)
Debug.Print el.innerText
Debug.Print el.href
End Sub
Upvotes: 4
Reputation:
Once way is using regular expressions. Another way is using Split to split the strings on various delimiters Eg
Option Explicit
Sub splitMethod()
Dim Str As String
Str = Sheet1.Range("A1").Value
Debug.Print Split(Str, """")(1)
Debug.Print Split(Split(Str, ">")(1), "</a")(0)
End Sub
Sub RegexMethod()
Dim Str As String
Dim oRegex As Object
Dim regexArr As Object
Dim rItem As Object
'Assumes Sheet1.Range("A1").Value holds example string
Str = Sheet1.Range("A1").Value
Set oRegex = CreateObject("vbscript.regexp")
With oRegex
.Global = True
.Pattern = "(href=""|>)(.+?)(""|</a>)"
Set regexArr = .Execute(Str)
'No lookbehind so replace unwanted chars
.Pattern = "(href=""|>|""|</a>)"
For Each rItem In regexArr
Debug.Print .Replace(rItem, vbNullString)
Next rItem
End With
End Sub
'Output:
'http://www.nbc.com/xyz
'I love this show
This matches href="
or >
at the start of the string, "
or </a>
at the end of the string with any character (except \n newline) in between (.+?)
Upvotes: 0