Zamblek
Zamblek

Reputation: 809

regex to match any string whether Unicode or not?

I want to match any string between title tags

$string = "<title>نص عربى English text</title>";

$pattern = '/<title>(regex.here)<\/title>/u';

if (preg_match_all($pattern, $string, $matches, PREG_SET_ORDER)) {
print_r($matches);
} else {
echo 'No matches.';
}    

the return should be

نص عربى English text

Upvotes: 2

Views: 641

Answers (4)

Lars Strojny
Lars Strojny

Reputation: 665

If your PCRE is compiled with unicode support, you can just match against the letter space from the unicode standard.

 <?php
 preg_match_all('|<title>(\p{L}+)</title>|u', $string, ...);

Please note the u-modifier, that enables unicode matching.

Upvotes: 2

rasjani
rasjani

Reputation: 7970

Copy pasted into a file, changed the match expression to get anything between title tags and print the first match:

<?PHP
$string = "<title>ﻦﺻ ﻉﺮﺑﻯ English text</title>";
$pattern = '/<title>(.*)<\/title>/u';
if (preg_match_all($pattern, $string, $matches, PREG_SET_ORDER)) {
    print($matches[0][1]."\n");                                                      
} else {
    echo 'No matches.';
} 
?>

output:

rasjani@laptop:~$ php unitest.php 
نص عربى English text
rasjani@laptop:~$ 

Upvotes: 1

Soren
Soren

Reputation: 14688

The (??????) will only match something which is exactly 6 characters long, and it will only match '?'. To match 'any' character, use '.' and to match repeating number of them use '.*'

Matching HTML tags like that is not easy in regex, so you should probably use a HTML parser for that instead.

As an aproximation you could do something like /<title>([^<]*)<\/title>/ Which will almost work, as long as your text does not contain a '<'

Upvotes: 0

Tudor Constantin
Tudor Constantin

Reputation: 26861

try with

$string = "<title>نص عربى English text</title>";

$pattern = '/<title>([\x{0000}-\xFFFF]*.*?)<\/title>/u';

if (preg_match_all($pattern, $string, $matches, PREG_SET_ORDER)) {
print_r($matches);
} else {
echo 'No matches.';
}    

Upvotes: 2

Related Questions