Reputation: 725
I want to extract the links from a String with regular expressions. I found a similar post here and I tried this code
let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>.*?</a>")
let range = NSMakeRange(0, text.characters.count)
let htmlLessString :String = regex.stringByReplacingMatches(in: text,
options: [],
range:range ,
withTemplate: "")
but the proposed regular expression deleted all the content of the href
tag. My string look like
SOME stirng <a href="https://com.mywebsite.com/yfgvh/f23/fsd" rel="DFGHJ"> some text I need to keep </a> and other text
and the expected result is
SOME stirng https://com.mywebsite.com/yfgvh/f23/fsd some text I need to keep and other text
the perfect result is
SOME stirng some text I need to keep (https://com.mywebsite.com/yfgvh/f23/fsd) and other text
Do you have an idea if it's possible to achieve this?
Upvotes: 0
Views: 19038
Reputation: 1548
I not regular developer of Swift, but, Did you tried to use the withTemplate
option of stringByReplacingMatches
like this?
let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>(.*)?</a>")
let range = NSMakeRange(0, text.characters.count)
let htmlLessString :String = regex.stringByReplacingMatches(in:
text,
options: [],
range:range ,
withTemplate: @"$2 ($1)")
Upvotes: 1
Reputation: 285069
Of course it deletes the href
content because you are ...ReplacingMatches...with empty string.
Your sample string does not match the pattern because the closing tag </a>
is missing.
The pattern "<a[^>]+href=\"(.*?)\"[^>]*>"
checks until a closing angle bracket after the link.
The captured group is located at index 1 of the match. This code prints all extracted links:
let text = "<a href=\"https://com.mywebsite.com/yfgvh/f23/fsd\" rel=\"DFGHJ\">"
let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>")
let range = NSMakeRange(0, text.characters.count)
let matches = regex.matches(in: text, range: range)
for match in matches {
let htmlLessString = (text as NSString).substring(with: match.rangeAt(1))
print(htmlLessString)
}
Upvotes: 4
Reputation: 1332
This regex seems to work in this case: href="(.*)" .*">(.*)<\/a>(.*)
, group 1 would have your url , group 2 text between <a></a>
and group 3 text after <a></a>
, however you will have to use this extension to be able to get information out of groups, as there is no native group support: http://samwize.com/2016/07/21/how-to-capture-multiple-groups-in-a-regex-with-swift/
Upvotes: 0