Why the pattern matches one word while there is two identical word?

Question

Please take a look at this:

as you see there is just one matched in the regex101, but the browser matches two words which are identical. So why regex101 cannot match the second word? Anyway I need to match both words (or more if exists).

Noted that it isn't related to g flag. Because I've used it in the fiddle.

Here is the fiddle

revo · Accepted Answer

Dealing with such a text is hard for later use. You have to find different representation of each letter to change search word from مجلس to something else like احمدی نژاد according to @Wiktor's solution.

That's why normalization process comes handy:

Normalization is a process that involves transforming characters and sequences of characters into a formally-defined underlying representation. This process is most important when text needs to be compared for sorting and searching, but it is also used when storing text to ensure that the text is stored in a consistent representation.

We need to normalize our input string at the very first place using Normalizer::normalize() then without any change in Regular Expression, safely we can run a preg_match_all over it:



Outputs:

Array
(
    [0] => Array
        (
            [0] => مجلس
            [1] => مجلس
        )

)


Note: it needs php_intl.dll extension to be enabled. 

Live demo

Why the pattern matches one word while there is two identical word?

Answers (2)

Related Questions