sam
sam

Reputation: 1087

replace all occurrences of a string

I want to add a class to all p tags that contain arabic text in it. For example:

<p>لمبارة وذ</p> 
<p>do nothing</p> 
<p>خمس دقائق يخ</p> 
<p>مراعاة إبقاء 3 لاعبين</p>

should become

<p class="foo">لمبارة وذ</p> 
<p>do nothing</p>
<p class="foo">خمس دقائق يخ</p> 
<p class="foo">مراعاة إبقاء 3 لاعبين</p>

I am trying to use PHP preg_replace function to match the pattern (arabic) with following expression:

preg_replace("~(\p{Arabic})~u", "<p class=\"foo\">$1", $string, 1);

However it is not working properly. It has two problems:

  1. It only matches the first paragraph.
  2. Adds an empty <p>.

Sandbox Link

Upvotes: 1

Views: 100

Answers (3)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

Using DOMDocument and DOMXPath:

$html = <<<'EOD'
<p>لمبارة وذ</p> 
<p>خمس دقائق يخ</p> 
<p>مراعاة إبقاء 3 لاعبين</p>
EOD;

libxml_use_internal_errors(true);

$dom = new DOMDocument;
$dom->loadHTML('<div>'.$html.'</div>', LIBXML_HTML_NOIMPLIED);

$xpath = new DOMXPath($dom);

// here you register the php namespace and the preg_match function
// to be able to use it in the XPath query
$xpath->registerNamespace("php", "http://php.net/xpath");
$xpath->registerPhpFunctions('preg_match');

// select only p nodes with at least one arabic letter
$pNodes = $xpath->query("//p[php:functionString('preg_match', '~\p{Arabic}~u', .) > 0]");

foreach ($pNodes as $pNode) {
    $pNode->setAttribute('class', 'foo');
}

$result = '';
foreach ($dom->documentElement->childNodes as $childNode) {
    $result .= $dom->saveHTML($childNode);
}

echo $result;

Upvotes: 2

trincot
trincot

Reputation: 350127

It only matches the first paragraph.

This is because you added the last argument, indicating you want only to replace the first occurrence. Leave that argument out.

Adds an empty <p>.

This is in fact the original <p> which you did not match. Just add it to the matching pattern, but keep it outside of the matching group, so it will be left out when you replace with $1.

Here is a corrected version, also on sandbox:

$text = preg_replace("~<p>(\p{Arabic}+)~u", "<p class=\"foo\">$1", $string);

Upvotes: 3

Laurel
Laurel

Reputation: 6173

Your first problem is that you weren't telling it to match the <p>, so it didn't.

Your main problem is that spaces aren't Arabic. Simply adding the alternative to match them fixes your problem:

$text = preg_replace("~<p>(\p{Arabic}*|\s*)~u", "<p class=\"foo\">$1", $string);

Upvotes: 2

Related Questions