lm.
lm.

Reputation: 4311

Checking string for non-latin symbols in C++

For a given path string (char *) I'm trying to check whether it contains some non-Latin symbols.

I'm checking whether it contains at least one character with an ASCII code greater than 127. Is it enough to check by such a way or is there a more effective way?

Upvotes: 0

Views: 2212

Answers (2)

RvdK
RvdK

Reputation: 19790

To check if there are non-latin characters it is enough to look if character above 128 are present. But remember the meaning of those 'upper bound' characters is not detectable. Code pages where introduced for all different languages. For russian(cyrillic) you have for example CP1251 (among others). In that codepage the character 8Dh (141d) is Ќ. But in codepage 1256 (Arabic) this means چ. It has the same value but the meaning is different!

Unicode 'solves' this problem because all characters have a unique value! (therefore the size is not exact 8 bits, but can also be 16/32bits.

The first 128 characters of Unicode and ASCII match for legacy reasons.

Read this 'The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)'

Upvotes: 3

Tony Delroy
Tony Delroy

Reputation: 106068

That depends on what you're doing with the string (i.e. which API function you're using), your operating system and possibly file system, even the driver for the file system device. You should provide more information. ASCII characters in the range 32..126 tend to be pretty widely accepted and recognisable (32 being a space, and 127 often looking like one so best avoided): but more may be legal in your particular environment.

Upvotes: 0

Related Questions