abernier
abernier

Reputation: 28218

Parse a string of HTML attributes and potentially inline styling declarations into a multidimensional associative array

I've got a string with HTML attributes:

$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';

How to transform that string into an indexed array, like:

array(
  'id' => 'header',
  'class' => array('foo', 'bar'),
  'style' => array(
    'background-color' => '#fff',
    'color' => 'red'
  )
)

so I can use the PHP array_merge_recursive() function to merge 2 sets of HTML attributes.

Upvotes: 12

Views: 17222

Answers (7)

Michael
Michael

Reputation: 9293

DOMDocument is the answer.

Take your original list of attributes and wrap it in some kind of tag, then feed it through DOMDocument, and import it into simplexml.

If an array is what you want, you can cast the SimpleXMLElement as an array and access its @attributes property as an array key.

$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';

$params = new DOMDocument;
$params->loadHTML('<br'.$attribs.'/>');
$paramsXML = simplexml_import_dom($params->documentElement);
$paramsArray = ((array)$paramsXML->body->br->attributes())['@attributes']
print_r($paramsArray);
/*
Array
(
    [id] => header 
    [class] => foo   bar
    [style] => background-color:#fff; color: red; 
)
*/

Upvotes: 0

Amaury Herrera Brito
Amaury Herrera Brito

Reputation: 21

A simple and effective function to solve this

function attrString2Array($attr) {
  $atList = [];

  if (preg_match_all('/\s*(?:([a-z0-9-]+)\s*=\s*"([^"]*)")|(?:\s+([a-z0-9-]+)(?=\s*|>|\s+[a..z0-9]+))/i', $attr, $m)) {
    for ($i = 0; $i < count($m[0]); $i++) {
      if ($m[3][$i])
        $atList[$m[3][$i]] = null;
      else
        $atList[$m[1][$i]] = $m[2][$i];
    }
  }

  return $atList;
}

print_r(attrString2Array('<li data-tpl-classname="class" data-tpl-title="innerHTML" disabled nowrap href="#" hide src = "images/asas.gif">'));
print_r(attrString2Array('data-tpl-classname="class" data-tpl-title="innerHTML" disabled nowrap href="#" hide src = "images/asas.gif"'));

//Array
//(
//    [data-tpl-classname] => class
//    [data-tpl-title] => innerHTML
//    [disabled] => 
//    [nowrap] => 
//    [href] => #
//    [hide] => 
//    [src] => images/asas.gif
//)

Upvotes: 2

Mariyo
Mariyo

Reputation: 506

Easy way could be also:

$atts_array = current((array) new SimpleXMLElement("<element $attribs />"));

Upvotes: 8

Gumbo
Gumbo

Reputation: 655489

You could use a regular expression to extract that information:

$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';
$pattern = '/(\\w+)\s*=\\s*("[^"]*"|\'[^\']*\'|[^"\'\\s>]*)/';
preg_match_all($pattern, $attribs, $matches, PREG_SET_ORDER);
$attrs = array();
foreach ($matches as $match) {
    if (($match[2][0] == '"' || $match[2][0] == "'") && $match[2][0] == $match[2][strlen($match[2])-1]) {
        $match[2] = substr($match[2], 1, -1);
    }
    $name = strtolower($match[1]);
    $value = html_entity_decode($match[2]);
    switch ($name) {
    case 'class':
        $attrs[$name] = preg_split('/\s+/', trim($value));
        break;
    case 'style':
        // parse CSS property declarations
        break;
    default:
        $attrs[$name] = $value;
    }
}
var_dump($attrs);

Now you just need to parse the classes of class (split at whitespaces) and property declarations of style (a little bit harder as it can contain comments and URLs with ; in it).

Upvotes: 8

Ken Keenan
Ken Keenan

Reputation: 10558

Use SimpleXML:

<?php
$attribs = ' id= "header " class = "foo   bar" style ="background-color:#fff; color: red; "';

$x = new SimpleXMLElement("<element $attribs />");

print_r($x);

?>

This assumes that the attributes are always name/value pairs...

Upvotes: 24

troelskn
troelskn

Reputation: 117567

You can't use a regular expression to parse html-attributes. This is because the syntax is contextual. You can use regular expressions to tokenize the input, but you need a state machine to parse it.

If the performance isn't a big deal, the safest way to do it, is probably to wrap the attributes in a tag and then send it through an html parser. Eg.:

function parse_attributes($input) {
  $dom = new DomDocument();
  $dom->loadHtml("<foo " . $input. "/>");
  $attributes = array();
  foreach ($dom->documentElement->attributes as $name => $attr) {
    $attributes[$name] = $node->value;
  }
  return $attributes;
}

You could probably optimize the above, by reusing the parser, or by using XmlReader or the sax parser.

Upvotes: 6

TigerTiger
TigerTiger

Reputation: 10806

May be this helps you .. What it does ..

  • A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
  • Require PHP 5+.
  • Supports invalid HTML.
  • Find tags on an HTML page with selectors just like jQuery.
  • Extract contents from HTML in a single line.

http://simplehtmldom.sourceforge.net/

Upvotes: 3

Related Questions