Reputation: 5111

Regex that checks upper or lower case characters with or without accents

How can I make the following regular expression ignore all whitespaces?

$foo = ereg_replace("[^áéíóúÁÉÍÓÚñÑa-zA-Z]", "", $_REQUEST["bar"]);

Input: Ingeniería Eléctrica'*;<42

Current Output: IngenieríaEléctrica

Desired Output: Ingeniería Eléctrica

I tried adding /s \s\s* \s+ /\s+/ /s /t /r among others and they all failed.

Objective: A regex that will accept only strings with upper or lower case characters with or without (spanish) accents.

Thank you !

Upvotes: 1

Answers (4)

Artefacto

Reputation: 97815

All the answers so far fail to point out that your method to match the accentuated characters is a hack and it's incomplete – for instance, no grave accents are matched.

The best way is to use the mbstring extension:

mb_regex_encoding("UTF-8"); //or whatever encoding you're using
var_dump(mb_ereg_replace("[^\\w\\s]|[0-9]", "", "Ingeniería Eléctrica'*;<42", "z"));

gives

string(22) "Ingeniería Eléctrica"

Upvotes: 0

Felix Kling

Reputation: 816374

ereg_replace uses POSIX Extended Regular Expressions and there, POSIX bracket expressions are used.

Now the important thing to know is that inside bracket expressions, \ is not a meta-character and therefore \s won't work.

But you can use the POSIX character class [:space:] inside the POSIX bracket expression to achieve the same effect:

$foo = ereg_replace("[^áéíóúÁÉÍÓÚñÑa-zA-Z[:space:]]", "", $_REQUEST["bar"]);

You see, it is different from the, I think, better known Perl syntax and as the POSIX regular expression functions are deprecated in PHP 5.3 you really should go with the Perl compatible ones.

Upvotes: 0

Chris

Reputation: 10338

I see no reason as to why adding \s to that regex would not work. \s should match all whitespace characters.

$foo = preg_replace("/[^áéíóúÁÉÍÓÚñÑa-zA-Z\s]/", "", $_REQUEST["bar"]);

Upvotes: 3

ahmetunal

Reputation: 3950

I believe this should work

$foo = ereg_replace("[^áéíóúÁÉÍÓÚñÑa-zA-Z ]", "", $_REQUEST["bar"]);

Upvotes: 0

Regex that checks upper or lower case characters with or without accents

Answers (4)

Related Questions