Mert Nuhoglu
Mert Nuhoglu

Reputation: 10133

Vim regex matches unicode characters are as non-word

I have the following text:

üyü

The following regex search matches the characters ü:

/\W

Is there a unicode flag in Vim regex?

Upvotes: 2

Views: 1138

Answers (4)

slaxor
slaxor

Reputation: 516

very often I find \S+ takes me where I want to go. i.e: s/\(\S\+\)\s\+\(\S\+\).*/\1 | \2/ selects "wörd1 w€rd2 but not word3" and replaces the line with "wörd1 | w€rd2"

Upvotes: 0

Mehrdad Mirreza
Mehrdad Mirreza

Reputation: 1082

I always use:

ASCII                           UTF-8
-----                           -----
\w                              [a-zA-Z\u0100-\uFFFF]
\W                              [^a-zA-Z\u0100-\uFFFF]

Upvotes: 3

Ingo Karkat
Ingo Karkat

Reputation: 172540

Unfortunately, there is no such flag (yet).

Some built-in character classes (can) include multi-byte characters, others don't. The common \w \a \l \u classes only contain ASCII letters, so even umlaut characters aren't included in them, leading to unexpected behavior! See also https://unix.stackexchange.com/a/60600/18876.

In the 'isprint' option (and 'iskeyword', which determines what motions like w move over), multi-byte characters 256 and above are always included, only extended ASCII characters up to 255 are specified with this option.

Upvotes: 4

romainl
romainl

Reputation: 196546

You can use \%uXXXX to match a multibyte character. In that case…

/\%u00fc

But I'm not aware of a flag that would make the whole matching multibyte-friendly.

Note that with the default value of iskeyword on UNIX systems, ü is matched by \k.

Upvotes: 2

Related Questions