Reputation: 809
I want to match any string between title tags
$string = "<title>نص عربى English text</title>";
$pattern = '/<title>(regex.here)<\/title>/u';
if (preg_match_all($pattern, $string, $matches, PREG_SET_ORDER)) {
print_r($matches);
} else {
echo 'No matches.';
}
the return should be
نص عربى English text
Upvotes: 2
Views: 641
Reputation: 665
If your PCRE is compiled with unicode support, you can just match against the letter space from the unicode standard.
<?php
preg_match_all('|<title>(\p{L}+)</title>|u', $string, ...);
Please note the u-modifier, that enables unicode matching.
Upvotes: 2
Reputation: 7970
Copy pasted into a file, changed the match expression to get anything between title tags and print the first match:
<?PHP
$string = "<title>ﻦﺻ ﻉﺮﺑﻯ English text</title>";
$pattern = '/<title>(.*)<\/title>/u';
if (preg_match_all($pattern, $string, $matches, PREG_SET_ORDER)) {
print($matches[0][1]."\n");
} else {
echo 'No matches.';
}
?>
output:
rasjani@laptop:~$ php unitest.php
نص عربى English text
rasjani@laptop:~$
Upvotes: 1
Reputation: 14688
The (??????) will only match something which is exactly 6 characters long, and it will only match '?'. To match 'any' character, use '.' and to match repeating number of them use '.*'
Matching HTML tags like that is not easy in regex, so you should probably use a HTML parser for that instead.
As an aproximation you could do something like
/<title>([^<]*)<\/title>/
Which will almost work, as long as your text does not contain a '<'
Upvotes: 0
Reputation: 26861
try with
$string = "<title>نص عربى English text</title>";
$pattern = '/<title>([\x{0000}-\xFFFF]*.*?)<\/title>/u';
if (preg_match_all($pattern, $string, $matches, PREG_SET_ORDER)) {
print_r($matches);
} else {
echo 'No matches.';
}
Upvotes: 2