Loenvpy
Loenvpy

Reputation: 919

Ruby on rails: Regex to include accented and specials characters?

In my rails app I want to use a regex that accept accented characters(é ç à, ...) and special characters(& () " ' , ...), right now this is my vlidation

validates_format_of :job_title, 
                      :with =>  /[a-zA-Z0-9]/, 
                      :message  => "le titre de l'offre n'est pas valide",
                      :multiline => true 

i want also that regex to not accept non latin characters like Arabic, Chinese, ...

Upvotes: 1

Views: 1941

Answers (3)

matt
matt

Reputation: 79803

For the Latin characters you could use the \p{Latin} script character property. You would have to make sure you normalize the input first, as decomposed strings won’t match (i.e. strings containing characters using combining characters). Also this wouldn’t match things like (that’s x followed by COMBINING ACUTE ACCENT) since it won’t compose into a single character, but that’s probably okay as it’s not likely to be actually used by anyone.

For the “special characters” you really need to be more specific about what you want. You say you want to allow " and ' (so called “straight” quotes), but what about , , and (“typographical” or “curly” quotes”). And since you are allowing European languages, what about «, », , and ? You could use the \p{Punct} class, which should match all these and more, you will need to decide if it matches too much.

You probably also want to match spaces as well. Will just the space character be okay? What about tabs, non-breaking spaces, newlines etc.? \p{Space} should get them.

There may be other characters you need to match that these won’t pick up, e.g. current symbols, may need to add those too.

So a first attempt at your regex might look like this (I’ve added \A and \z to anchor the start and end, as well as * to match all characters – I think you will need them):

/\A[\p{Latin}\p{Punct}\p{Space}0-9]*\z/

Upvotes: 2

Alfonso
Alfonso

Reputation: 759

A simple option is to white-list all the characters you want to accept. For example:

/[a-zA-Z0-9áéíóúÁÉÍÓÚÑñ&*]/

Instead of a-zA-Z0-9 you can use \w. It represents any word character (letter, number, underscore).

/[\wáéíóúÁÉÍÓÚÑñ&*]/

Upvotes: 1

Uri Agassi
Uri Agassi

Reputation: 37419

Use [:alnum:] for alphanumeric characters:

validates_format_of :job_title, 
                  :with =>  /[[:alnum:]]/, 
                  :message  => "le titre de l'offre n'est pas valide",
                  :multiline => true 

Upvotes: 2

Related Questions