D. Greg
D. Greg

Reputation: 1000

Regex catch word at the start and end of a UITextView

I'm trying to catch when a word is used in a UITextView. I've got it working for words in the interior of the view.

The problem is when the word is first or last in the view. My code so far:

private func filteredTermFor(_ word: String) -> String {
    let punctuationFilter = "([\\A|\\W|\\d|\\z| ])"
    let wordInParens = "(\(word))"
    return punctuationFilter + wordInParens + punctuationFilter
}

I checked and found I should use ^ for the start of input and $ for the end of input. When I add either of these, for example:

"([^|\\A|\\W|\\d|\\z| ])"

they don't seem to have any effect when the word in question is the first or last in the view.

*For the sake of being verbose with my question, the return value from the function above is being used as searchTerm in this:

    func highlightedTextInString(with searchTerm: String, targetString: String) -> NSAttributedString? {
    let attributedString = NSMutableAttributedString(string: targetString)
    do {
        let regex = try NSRegularExpression(pattern: searchTerm, options: .caseInsensitive)
        let range = NSRange(location: 0, length: targetString.utf16.count)
        for match in regex.matches(in: targetString, options: .withTransparentBounds, range: range) {
            let fontColor = UIColor.red
            attributedString.addAttribute(NSForegroundColorAttributeName, value: fontColor, range: match.range)
        }
        return attributedString
    } catch _ {
        print("Error creating regular expression")
        return nil
    }
}

** Edit ** Since this was marked as a duplicate The question this was reported a duplicate of does not cover edge cases when the word is typed next to a punctuation mark or digit without spaces. For example: .word , word9 , ?word?

Upvotes: 1

Views: 451

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626923

Note that ([^|\\A|\\W|\\d|\\z| ]) is a capturing group ((...)) containing a character class that matches a single char defined inside it. The ^ after [ makes the class a negated one, and it matches any char but the one(s) defined in the set. So, [^|\\A|\\W|\\d|\\z| ] matches a single char other than | (it is no longer an alternation operator inside a character class), A (the \ in front is not considered, is omitted), a non-word char, a digit, z and space. It effectively matches _ and any letters other than A and z.

You state that the words you need to match may occur within word boundaries or digits.

You may use

return "(?<![^\\W\\d])(\(word))(?![^\\W\\d])"

See the regex demo.

Here, "(?<![^\\W\\d])" is a negative lookbehind that matches a location that is NOT immediately preceded with a character other than a non-word and a digit char. This sounds cumbersome, but the main point here is that [^\W\d] matches the same texts as \w excluding digits (\w matches letters, digit, and _. So, "(?<![^\\W\\d])" makes sure there is a start of string or a non-letter and non-_ char right before the word. If you allow a word to match after _, just use (?<!\\p{L}) (where \p{L} matches any Unicode letter).

The "(?![^\\W\\d])" is a negative lookahead that makes sure there is an end of string or a non-letter and non-_ (there can be punctuation, symbols and digits) immediately to the right of the word. Again, if you want to match a word if it is followed with _, you may replace this lookahead with "(?!\\p{L})" (just no letter after the word is allowed).

Upvotes: 1

Related Questions