J.Doe
J.Doe

Reputation: 725

Regular expression to extract href url

I want to extract the links from a String with regular expressions. I found a similar post here and I tried this code

let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>.*?</a>")
        let range = NSMakeRange(0, text.characters.count)
        let htmlLessString :String = regex.stringByReplacingMatches(in: text,
                                                                            options: [],
                                                                            range:range ,
                                                                            withTemplate: "")

but the proposed regular expression deleted all the content of the href tag. My string look like

SOME stirng  <a href="https://com.mywebsite.com/yfgvh/f23/fsd" rel="DFGHJ"> some text I need to keep </a> and other text

and the expected result is

SOME stirng  https://com.mywebsite.com/yfgvh/f23/fsd some text I need to keep and other text

the perfect result is

SOME stirng some text I need to keep (https://com.mywebsite.com/yfgvh/f23/fsd) and other text

Do you have an idea if it's possible to achieve this?

Upvotes: 0

Views: 19038

Answers (3)

dloeda
dloeda

Reputation: 1548

I not regular developer of Swift, but, Did you tried to use the withTemplateoption of stringByReplacingMatches like this?

let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>(.*)?</a>")
        let range = NSMakeRange(0, text.characters.count)
        let htmlLessString :String = regex.stringByReplacingMatches(in: 
                                                                 text,
                                                                 options: [], 
                                                                 range:range ,
                                                                 withTemplate: @"$2 ($1)")

Upvotes: 1

vadian
vadian

Reputation: 285069

Of course it deletes the href content because you are ...ReplacingMatches...with empty string.

Your sample string does not match the pattern because the closing tag </a> is missing.

The pattern "<a[^>]+href=\"(.*?)\"[^>]*>" checks until a closing angle bracket after the link.

The captured group is located at index 1 of the match. This code prints all extracted links:

let text = "<a href=\"https://com.mywebsite.com/yfgvh/f23/fsd\" rel=\"DFGHJ\">"

let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>")
let range = NSMakeRange(0, text.characters.count)
let matches = regex.matches(in: text, range: range)
for match in matches {
    let htmlLessString = (text as NSString).substring(with: match.rangeAt(1))
    print(htmlLessString)
}

Upvotes: 4

Xyzk
Xyzk

Reputation: 1332

This regex seems to work in this case: href="(.*)" .*">(.*)<\/a>(.*) , group 1 would have your url , group 2 text between <a></a> and group 3 text after <a></a> , however you will have to use this extension to be able to get information out of groups, as there is no native group support: http://samwize.com/2016/07/21/how-to-capture-multiple-groups-in-a-regex-with-swift/

Upvotes: 0

Related Questions