aidan
aidan

Reputation: 9576

regex unicode character in vim

I'm being an idiot.

Someone cut and pasted some text from microsoft word into my lovely html files.

I now have these unicode characters instead of regular quote symbols, (i.e. quotes appear as <92> in the text)

I want to do a regex replace but I'm having trouble selecting them.

:%s/\u92/'/g
:%s/\u5C/'/g
:%s/\x92/'/g
:%s/\x5C/'/g

...all fail. My google-fu has failed me.

Upvotes: 45

Views: 22669

Answers (3)

Parthiban
Parthiban

Reputation: 1

i too faced the same issue , while in the CSV files we were able to see some space, due to that I was getting the o/p from the program with unicode value. post using with below unicode cmd in vim ,it got sorted out.

:%s/%xa0//g

enter image description here

Upvotes: 0

Michael Ekoka
Michael Ekoka

Reputation: 20088

This solution might not address the problem as originally stated, but it does address a different but very closely related one and I think it makes a lot of sense to place it here.

I don't know in which version of Vim it was implemented, but I was working on 7.4 when I tried it.

When in Edit mode, the sequence to output unicode characters is: ctrl-v u xxxx where xxxx is the code point. For instance outputting the euro sign would be ctrl-v u 20ac.

I tried it in Command mode as well and it worked. That is, to replace all instances of "20 euro" in my document with "20 €", I'd do:

:%s/20 euro/20 <ctrl-v u 20ac>/gc

In the above <ctrl-v u 20ac> is not literal, it's the sequence of keys that will output the character.

Upvotes: 3

michaelmichael
michaelmichael

Reputation: 14125

From :help regexp (lightly edited), you need to use some specific syntax to select unicode characters with a regular expression in Vim:

\%u match specified multibyte character (eg \%u20ac)

That is, to search for the unicode character with hex code 20AC, enter this into your search pattern:

\%u20ac

The full table of character search patterns includes some additional options:

\%d match specified decimal character (eg \%d123)
\%x match specified hex character (eg \%x2a)
\%o match specified octal character (eg \%o040)
\%u match specified multibyte character (eg \%u20ac)
\%U match specified large multibyte character (eg \%U12345678)

Upvotes: 80

Related Questions