Reputation: 1000
I'm trying to catch when a word is used in a UITextView
. I've got it working for words in the interior of the view.
The problem is when the word is first or last in the view. My code so far:
private func filteredTermFor(_ word: String) -> String {
let punctuationFilter = "([\\A|\\W|\\d|\\z| ])"
let wordInParens = "(\(word))"
return punctuationFilter + wordInParens + punctuationFilter
}
I checked and found I should use ^
for the start of input and $
for the end of input. When I add either of these, for example:
"([^|\\A|\\W|\\d|\\z| ])"
they don't seem to have any effect when the word in question is the first or last in the view.
*For the sake of being verbose with my question, the return value from the function above is being used as searchTerm
in this:
func highlightedTextInString(with searchTerm: String, targetString: String) -> NSAttributedString? {
let attributedString = NSMutableAttributedString(string: targetString)
do {
let regex = try NSRegularExpression(pattern: searchTerm, options: .caseInsensitive)
let range = NSRange(location: 0, length: targetString.utf16.count)
for match in regex.matches(in: targetString, options: .withTransparentBounds, range: range) {
let fontColor = UIColor.red
attributedString.addAttribute(NSForegroundColorAttributeName, value: fontColor, range: match.range)
}
return attributedString
} catch _ {
print("Error creating regular expression")
return nil
}
}
** Edit **
Since this was marked as a duplicate
The question this was reported a duplicate of does not cover edge cases when the word is typed next to a punctuation mark or digit without spaces.
For example:
.word
, word9
, ?word?
Upvotes: 1
Views: 451
Reputation: 626923
Note that ([^|\\A|\\W|\\d|\\z| ])
is a capturing group ((...)
) containing a character class that matches a single char defined inside it. The ^
after [
makes the class a negated one, and it matches any char but the one(s) defined in the set. So, [^|\\A|\\W|\\d|\\z| ]
matches a single char other than |
(it is no longer an alternation operator inside a character class), A
(the \
in front is not considered, is omitted), a non-word char, a digit, z
and space. It effectively matches _
and any letters other than A
and z
.
You state that the words you need to match may occur within word boundaries or digits.
You may use
return "(?<![^\\W\\d])(\(word))(?![^\\W\\d])"
See the regex demo.
Here, "(?<![^\\W\\d])"
is a negative lookbehind that matches a location that is NOT immediately preceded with a character other than a non-word and a digit char. This sounds cumbersome, but the main point here is that [^\W\d]
matches the same texts as \w
excluding digits (\w
matches letters, digit, and _
. So, "(?<![^\\W\\d])"
makes sure there is a start of string or a non-letter and non-_
char right before the word. If you allow a word to match after _
, just use (?<!\\p{L})
(where \p{L}
matches any Unicode letter).
The "(?![^\\W\\d])"
is a negative lookahead that makes sure there is an end of string or a non-letter and non-_
(there can be punctuation, symbols and digits) immediately to the right of the word. Again, if you want to match a word if it is followed with _
, you may replace this lookahead with "(?!\\p{L})"
(just no letter after the word is allowed).
Upvotes: 1