夏期劇場
夏期劇場

Reputation: 18325

PHP to Detect and Split Out html Special Character Codes in a String?

In PHP when i read the Data, lets say the data (chunk of string) is containing HTML Special Character DECIMAL HEX Codes like:
This is a sample string with < œ < and š

What i want is, how to Detect and Split out the Decimal Hex Codes (of any Special Characters) inside a chunk of string?

For example, above string contains:

How can i programatically detect it (The OCCURRENCE for any Html Special Characters)?
(Collected results will be better as an Array)

Upvotes: 0

Views: 882

Answers (4)

Max Kuznetsov
Max Kuznetsov

Reputation: 156

You should use preg_match() - http://www.php.net/manual/en/function.preg-match.php with pattern like this '/&[0-9a-zA-Z]{1,5};/g'.

[Updated]: Note what entities you need. Is that just &#x[number][number][number]; or all possible html-entities (like  , < e.t.c.)?

Above I described the most common case.

Upvotes: 1

Peter Ilfrich
Peter Ilfrich

Reputation: 3816

You could use substr and strpos to find &# and skip to the next ;:

$string = "This is a sample string with œ and š"
$hexCodes = array();
while (strlen($string) > 0) {
  if (strpos("&#") > 0) {
    $string = substr($string, strpos("&#"));
    $hex = substr($string, 0, strpos(";") + 1);
    $string = substr($string, strpos(";") + 1);
    array_push($hexCodes, $hex);
  } 
  else { break; }
}

Upvotes: 1

JvdBerg
JvdBerg

Reputation: 21856

I think this is what you are after:

$s = 'This is a sample string with œ and š';

$pattern = '/\&#x\d+\;/';

preg_match_all($pattern, $s, $matches);   

var_dump( $matches );

This will output:

array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(7) "œ"
    [1]=>
    string(7) "š"
  }
}

Upvotes: 3

Yash Singla
Yash Singla

Reputation: 144

If you mean to decode the entities, use html_entity_decode. Here is an example:

<?php
$a = "I'll &quot;walk&quot; the &lt;b&gt;dog&lt;/b&gt;";

$b = html_entity_decode($a);

echo $b; // I'll "walk" the <b>dog</b> now
?>

Upvotes: -2

Related Questions