user1859243
user1859243

Reputation: 315

How to detect if string contains only latin symbols using Ruby 1.9?

I need to detect if some string contains symbols from a non latin alphabet. Numbers and special symbols like -, _, + are good. I need to know whether there is any non latin symbols. For example:

"123sdjjsf-4KSD".just_latin?

should return true.

"12333ыц4--sdf".just_latin?

should return false.

Upvotes: 10

Views: 5019

Answers (3)

Javier Diaz
Javier Diaz

Reputation: 1830

There you go, just match those characteres and you are done (a-z means characteres from a to z): ^[a-zA-Z_\-+]+$

Upvotes: 1

G. Allen Morris III
G. Allen Morris III

Reputation: 1042

I think that this should work for you:

 # encoding: UTF-8

 class String
   def just_latin?
     !!self.match(/^[a-zA-Z0-9_\-+ ]*$/)
   end
 end

 puts "123sdjjsf-4KSD".just_latin?
 puts "12333ыц4--sdf".just_latin?

Note that *#ascii_only?* is very close to what you want as well.

Upvotes: 7

Martin Ender
Martin Ender

Reputation: 44259

The following regular expression will match a single letter character that is not Latin:

[\p{L}&&[^a-zA-Z]]

The && syntax intersects two character classes. The first one (\p{L}) matches any Unicode letter. The second one ^a-zA-Z matches any character that is not (^) a Latin one (a-z or A-Z). I.e. the whole character class matches any letter that is not a Latin one.

See it working on Rubular.

So if you use this regular expression inside just_latin? and return true if no match is found, it should work just like you want it to.

I tried with the Unicode property \p{Latin} for the second character class before, but that is not entirely reliable, since \p{Latin} includes for instance the Icelandic characters þ, æ, ð.

Upvotes: 5

Related Questions