Reputation: 10283
I need to convert some HTML into plain text and have tried the approaches outlined here:
Convert HTML to Plain Text in Swift
The problem is that on iOS 8.2, NSAttributedString has a bug which can result in a EXC_BAD_ACCESS crash (deep inside WebKit) when HTML is rendered to plain text on a background thread. The conversion needs to be done on a background thread because it can (and usually does) take a while.
So I need a more primitive solution in Swift, ideally an idiomatic one.
It also strikes me that this is probably one of those problems which has an elegant and neat functional solution - it's essentially a filter()
operation on a String surely?
Upvotes: 4
Views: 4015
Reputation: 784
somewhat late to the party, but thought it would benefit other visitors...
basically I have taken the solution from here and transformed it into Swift 3 syntax.
Solution uses Scanner (previously NSScanner) to find occurences of "<" then scans into the ">" taking everything in between into consideration, saving it into an NSString variable.
Then just use replacingOccurrences(of:with:)
passing in the NSString variable
Here's what the final function looks like:
private func stripHTML(fromString rawString: String) -> String {
let scanner: Scanner = Scanner(string: rawString)
var text: NSString? = ""
var convertedString = rawString
while !scanner.isAtEnd {
scanner.scanUpTo("<", into: nil)
scanner.scanUpTo(">", into: &text)
convertedString = convertedString.replacingOccurrences(of: "\(text!)>", with: "")
}
return convertedString
}
Upvotes: 2
Reputation: 10283
The best solution I have come up with is a regex within a String extension, which is adequate for dealing with the HTML fragments I need to deal with:
extension String {
func plainTextFromHTML() -> String? {
let regexPattern = "<.*?>"
var err: NSError?
if let stripHTMLRegex = NSRegularExpression(pattern: regexPattern, options: NSRegularExpressionOptions.CaseInsensitive, error: &err) {
let plainText = stripHTMLRegex.stringByReplacingMatchesInString(self, options: NSMatchingOptions.ReportProgress, range: NSMakeRange(0, count(self)), withTemplate: "")
return err == nil ? plainText : nil
} else {
println("Warning: failed to create regular expression from pattern: \(regexPattern)")
return nil
}
}
}
Swift 2.2
extension String {
func plainTextFromHTML() -> String? {
let regexPattern = "<.*?>"
do {
let stripHTMLRegex = try NSRegularExpression(pattern: regexPattern, options: NSRegularExpressionOptions.CaseInsensitive)
let plainText = stripHTMLRegex.stringByReplacingMatchesInString(self, options: NSMatchingOptions.ReportProgress, range: NSMakeRange(0, self.characters.count), withTemplate: "")
return plainText
} catch {
print("Warning: failed to create regular expression from pattern: \(regexPattern)")
return nil
}
}
}
A more advanced solution will be needed for full HTML conversion to plain text however.
Upvotes: 1