Tony Lee
Tony Lee

Reputation: 31

NSRegularExpression with french character

why pattern

[A-Z][A-z]*

return Ve for French word Vénus using NSRegularExpression .I want to match camel word,but this word is strange

Upvotes: 2

Views: 209

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336448

The reason why your regex matches Ve and not is because there are two ways to represent an é in Unicode:

  • Using the normalized single codepoint U+00E9 or
  • Using the "decomposed" form: e, followed by the combining mark ´ (U+0065 U+0301). Note that the latter is not the actual "standalone" ´ character (U+00B4).

Your string is apparently encoded using the second option. Therefore [A-z] only matches the first half of the combined character. Since the following ´ doesn't match, the regex stops at this point. You should normalize the string first before applying a regex to it.

Furthermore, use [A-Za-z] instead of [A-z]. Otherwise, some non-letter characters like ^ or ] will also be matched.

Upvotes: 2

Related Questions