Abozanona
Abozanona

Reputation: 2295

Validate a combination of Arabic and English characters

I want to validate a string where each letter should be an Arabic or English letter or one of the symbols \-.ـ or a space.

The first regix I came with was

/^([\u0600-\u06ff\u0750-\u077f\ufb50-\ufc3f\ufe70-\ufefca-zA-Z\- .ـ]+)$/

Which worked fine with JS but not with pcre(php) validation.So I tried another solution to avoid \u in the validation.

/^[\p{Arabic}a-zA-Z\- .ـ]+$/

This regex gave me no error and worked exactly as I need

But PHP didn't, I tested the same text in php

if ( preg_match('/^[\p{Arabic}a-zA-Z\- .ـ]+$/', "engعربlisي هنا.hـ") )
      die("T");
else
      die("F");

The result of the code was F and not T, Why is that?

Upvotes: 1

Views: 2028

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

The Unicode block by itself in a PHP regex is not enough to match Unicode strings.

You need a /u modifier to actually force PHP to use Unicode matching.

u (PCRE_UTF8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern and the subject is checked since PHP 4.3.5. An invalid subject will cause the preg_* function to match nothing; an invalid pattern will trigger an error of level E_WARNING. Five and six octet UTF-8 sequences are regarded as invalid since PHP 5.3.4 (resp. PCRE 7.3 2007-08-28); formerly those have been regarded as valid UTF-8.

Thus:

if ( preg_match('/^[\p{Arabic}a-zA-Z\- .ـ]+$/u', "engعربlisي هنا.hـ") )
//                                          ^^
  die("T");
else
  die("F");

Outputs T.

See IDEONE demo

Upvotes: 1

Related Questions