Alex
Alex

Reputation: 163

Get a string inside the pattern using PHP

I have an html file and I want to get all the classes from this file in an array using PHP. For example this is my html file:

<div class="main menu">element</div>
<div class="content"></div>

I want to get an array with three elements (in this particular example): "main", "menu", "content".

In bash it is possible to use grep to accomplish this:

classes=($(grep -oP '(?<=class=").*?(?=")' "./index.html"))

How can I do the same in PHP?

I have this basic code at this moment:

//read the entire string
$str = implode("", file('./index.html'));
$fp = fopen('./index.html', 'w');
//Here I guess should be the function to get all of the strings
//now, save the file
fwrite($fp, $str, strlen($str));

Edit: How can my question be the duplicate of the one provided, if I am asking on how find the string using PHP? It is not bash and I have already provided the grep alternative.

Upvotes: 1

Views: 208

Answers (3)

Jonny 5
Jonny 5

Reputation: 12389

To get the three elements, try regex like this with preg_match_all function:

(?:class="|\G(?!^))\s*\K[^\s"]+
  • \G continues at end of the previous match or start
  • \K resets beginning of the reported match

See test at eval.in

if(preg_match_all('/(?:class="|\G(?!^))\s*\K[^\s"]+/', $str, $out) > 0)
  print_r($out[0]);

Array ( [0] => main [1] => menu [2] => content )

Note that generally regex is not the appropriate means for parsing html. depends if parsing own or arbitrary html and what going to achieve imho.

Upvotes: 4

You Old Fool
You Old Fool

Reputation: 22959

I would use php's DOMDocument() class like this:

$classes = array();
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTMLFile('./index.html');
$elements = $dom->getElementsByTagName('*');
foreach($elements as $element) {
    $classes = array_merge($classes,array_filter(explode(' ',$element->getAttribute('class'))));
}
print_r($classes);

Explanation:

  • declare empty array $classes
  • turn off errors DOMDocument might throw if it's incomplete or invalid html
  • instantiate new DOMDocument object
  • load file index.html into DOMDocument
  • get all elements using wildcard tagname
  • iterate over elements
  • get classname
  • explode classname by whitespace
  • filter exploded array to remove empty values
  • add result to $classes array

Upvotes: 4

Jesse Weigert
Jesse Weigert

Reputation: 4854

Depending on what you're trying to do, you can either use regular expressions using the preg_grep function, or you could traverse the DOM using the DOMDocument class.

Upvotes: 1

Related Questions