mck
mck

Reputation: 988

how to make a regex to detect unicode characters?

I am working on a application in which i have to detect unicode characters for example my text is

Suzana R°u˘zi˘ckova and Viktor Kalabis, Yvonne Sebastaková, Linda Servitová,
Sandra Stevenson.

I have written a regex for it "[^\u0000-\u0080]+" but it not detects all characters. Also the word R°u˘zi˘ckova is not displaying correctly in c# because the combinning characters are on the top of alphabets not as a separate character.

How to make a regex which detects all combined characters and i am working in c#.

Upvotes: 1

Views: 438

Answers (1)

Kent
Kent

Reputation: 195219

'[\x00-\x7f]' is ascii range

'[^\x00-\x7f]' is non-ascii char range

no idea about the re engine of asp.net, but you can give it a try.

here is a test with my grep:

kent$ (US-2998|✔) echo "Suzana R°u˘zi˘ckova and Viktor Kalabis, Yvonne Sebastaková, Linda Servitová,
Sandra Stevenson."|grep -oP '[^\x00-\x7f]'
°
˘
˘
á
á

Upvotes: 1

Related Questions