M A Russel
M A Russel

Reputation: 1557

Extract link from href in Swift

Suppose I have a html link like this:

<a href = "https://mitsui-shopping-park.com/lalaport/koshien/" target="_blank"> https://mitsui-shopping-park.com/lalaport / koshien / </a>

I want to extract:

<a href = "THIS LINK" target="_blank"> NOT THIS LINK </a> 

I tried: someString.replacingOccurrences(of: "<[^>]+>", with: "", options: .regularExpression, range: nil) but that gives me:

<a href = "NOT THIS LINK" target="_blank"> BUT THIS LINK </a>

Please help.

Upvotes: 0

Views: 3698

Answers (3)

ielyamani
ielyamani

Reputation: 18581

No need for a regular expression, you could use the link property of an attributed string.

First, let's use this extension:

extension String{
    func convert2Html() -> NSAttributedString {

        guard let data = data(using: .utf8) else { return NSAttributedString() }

        do {
            let htmlAttrib = NSAttributedString.DocumentType.html
            return try NSAttributedString(data: data,
                                          options: [.documentType : htmlAttrib],
                                          documentAttributes: nil)
        } catch {
            return NSAttributedString()
        }
    }
}

to convert this String:

let html = "<a href = \"https://mitsui-shopping-park.com/lalaport/koshien/\" target=\"_blank\"> https://mitsui-shopping-park.com/lalaport / koshien / </a>"

to an NSAttributedString:

let attrib = html.convert2Html()

And then extract the link this way :

let link = attrib.attribute(.link, at: 0, effectiveRange: nil)

if let url = link as? NSURL, let href = url.absoluteString {
    print(href)  //https://mitsui-shopping-park.com/lalaport/koshien/
}

Upvotes: 4

Ricky Mo
Ricky Mo

Reputation: 7618

Use NSRegularExpression.matches for the capture group feature of Regular Expression. I always use this handy extension method:

extension String {
    func capturedGroups(withRegex pattern: String) -> [String?] {
        var results = [String?]()

        var regex: NSRegularExpression
        do {
            regex = try NSRegularExpression(pattern: pattern, options: [])
        } catch {
            return results
        }

        let matches = regex.matches(in: self, options: [], range: NSRange(location:0, length: self.count))

        guard let match = matches.first else { return results }
        let lastRangeIndex = match.numberOfRanges - 1
        guard lastRangeIndex >= 1 else { return results }

        for i in 0...lastRangeIndex {
            let capturedGroupIndex = match.range(at: i)
            if(capturedGroupIndex.length>0)
            {
                let matchedString = (self as NSString).substring(with: capturedGroupIndex)
                results.append(matchedString)
            }
            else
            {
                results.append(nil)
            }
        }

        return results
    }
}

var html = """
<a href = "https://mitsui-shopping-park.com/lalaport/koshien/" target="_blank"> https://mitsui-shopping-park.com/lalaport / koshien / </a>
"""
print(html.capturedGroups(withRegex: "href\\s*=\\s*\"([^\"]+)\"")[1])

Upvotes: 0

rmaddy
rmaddy

Reputation: 318774

Here's one possible solution to grab the value between the href=" and the closing ". This only works with one href in the string.

let html = "<a href = \"https://mitsui-shopping-park.com/lalaport/koshien/\" target=\"_blank\"> https://mitsui-shopping-park.com/lalaport / koshien / </a>"

if let hrefRange = html.range(of: "(?:href\\s*=\\s*\")[^\"]*(?:\")", options: .regularExpression) {
    let href = html[hrefRange]
    print(href)
} else {
    print("There is no href")
}

Let's break down that regular expression:

First, let's remove the extra \ needed in the RE to make it a value Swift string. This leaves us with:

(?:href\s*=\s*")[^"]*(?:")

This has three main parts:

(?:href\s*=\s*") - the href, optional space, =, optional space, and opening quote
[^"]* - the actual URL - everything that isn't a quote
(?:") - the close quote

The (?: ) syntax means that the stuff inside won't be part of the returned string.

Upvotes: 4

Related Questions