Reputation: 315
I need to detect if some string contains symbols from a non latin alphabet. Numbers and special symbols like -
, _
, +
are good. I need to know whether there is any non latin symbols. For example:
"123sdjjsf-4KSD".just_latin?
should return true
.
"12333ыц4--sdf".just_latin?
should return false
.
Upvotes: 10
Views: 5019
Reputation: 1830
There you go, just match those characteres and you are done (a-z
means characteres from a
to z
): ^[a-zA-Z_\-+]+$
Upvotes: 1
Reputation: 1042
I think that this should work for you:
# encoding: UTF-8
class String
def just_latin?
!!self.match(/^[a-zA-Z0-9_\-+ ]*$/)
end
end
puts "123sdjjsf-4KSD".just_latin?
puts "12333ыц4--sdf".just_latin?
Note that *#ascii_only?* is very close to what you want as well.
Upvotes: 7
Reputation: 44259
The following regular expression will match a single letter character that is not Latin:
[\p{L}&&[^a-zA-Z]]
The &&
syntax intersects two character classes. The first one (\p{L}
) matches any Unicode letter. The second one ^a-zA-Z
matches any character that is not (^
) a Latin one (a-z
or A-Z
). I.e. the whole character class matches any letter that is not a Latin one.
So if you use this regular expression inside just_latin?
and return true
if no match is found, it should work just like you want it to.
I tried with the Unicode property \p{Latin}
for the second character class before, but that is not entirely reliable, since \p{Latin}
includes for instance the Icelandic characters þ
, æ
, ð
.
Upvotes: 5