Reputation: 1557
Suppose I have a html link like this:
<a href = "https://mitsui-shopping-park.com/lalaport/koshien/" target="_blank"> https://mitsui-shopping-park.com/lalaport / koshien / </a>
I want to extract:
<a href = "THIS LINK" target="_blank"> NOT THIS LINK </a>
I tried: someString.replacingOccurrences(of: "<[^>]+>", with: "", options: .regularExpression, range: nil)
but that gives me:
<a href = "NOT THIS LINK" target="_blank"> BUT THIS LINK </a>
Please help.
Upvotes: 0
Views: 3698
Reputation: 18581
No need for a regular expression, you could use the link property of an attributed string.
First, let's use this extension:
extension String{
func convert2Html() -> NSAttributedString {
guard let data = data(using: .utf8) else { return NSAttributedString() }
do {
let htmlAttrib = NSAttributedString.DocumentType.html
return try NSAttributedString(data: data,
options: [.documentType : htmlAttrib],
documentAttributes: nil)
} catch {
return NSAttributedString()
}
}
}
to convert this String
:
let html = "<a href = \"https://mitsui-shopping-park.com/lalaport/koshien/\" target=\"_blank\"> https://mitsui-shopping-park.com/lalaport / koshien / </a>"
to an NSAttributedString
:
let attrib = html.convert2Html()
And then extract the link this way :
let link = attrib.attribute(.link, at: 0, effectiveRange: nil)
if let url = link as? NSURL, let href = url.absoluteString {
print(href) //https://mitsui-shopping-park.com/lalaport/koshien/
}
Upvotes: 4
Reputation: 7618
Use NSRegularExpression.matches
for the capture group feature of Regular Expression. I always use this handy extension method:
extension String {
func capturedGroups(withRegex pattern: String) -> [String?] {
var results = [String?]()
var regex: NSRegularExpression
do {
regex = try NSRegularExpression(pattern: pattern, options: [])
} catch {
return results
}
let matches = regex.matches(in: self, options: [], range: NSRange(location:0, length: self.count))
guard let match = matches.first else { return results }
let lastRangeIndex = match.numberOfRanges - 1
guard lastRangeIndex >= 1 else { return results }
for i in 0...lastRangeIndex {
let capturedGroupIndex = match.range(at: i)
if(capturedGroupIndex.length>0)
{
let matchedString = (self as NSString).substring(with: capturedGroupIndex)
results.append(matchedString)
}
else
{
results.append(nil)
}
}
return results
}
}
var html = """
<a href = "https://mitsui-shopping-park.com/lalaport/koshien/" target="_blank"> https://mitsui-shopping-park.com/lalaport / koshien / </a>
"""
print(html.capturedGroups(withRegex: "href\\s*=\\s*\"([^\"]+)\"")[1])
Upvotes: 0
Reputation: 318774
Here's one possible solution to grab the value between the href="
and the closing "
. This only works with one href in the string.
let html = "<a href = \"https://mitsui-shopping-park.com/lalaport/koshien/\" target=\"_blank\"> https://mitsui-shopping-park.com/lalaport / koshien / </a>"
if let hrefRange = html.range(of: "(?:href\\s*=\\s*\")[^\"]*(?:\")", options: .regularExpression) {
let href = html[hrefRange]
print(href)
} else {
print("There is no href")
}
Let's break down that regular expression:
First, let's remove the extra \
needed in the RE to make it a value Swift string. This leaves us with:
(?:href\s*=\s*")[^"]*(?:")
This has three main parts:
(?:href\s*=\s*") - the href, optional space, =, optional space, and opening quote
[^"]* - the actual URL - everything that isn't a quote
(?:") - the close quote
The (?: )
syntax means that the stuff inside won't be part of the returned string.
Upvotes: 4