Reputation: 6258
I have a String which looks something like this:
$html_string = "<p>Some content</p><p>separated by</p><p>paragraphs</p>"
I'd like to do some parsing on the content inside the tags, so I think that creating an array from this would be easiest. Currently I'm using a series of explode
and implode
to achieve what I want:
$stripped = explode('<p>', $html_string);
$joined = implode(' ', $stripped);
$parsed = explode('</p>', $joined);
which in effect gives:
array('Some content', 'separated by', 'paragraphs');
Is there a better, more robust way to create an array from HTML tags? Looking at the docs, I didn't see any mention of parsing via a regular expression.
Thanks for your help!
Upvotes: 2
Views: 74
Reputation: 350232
Here is the DOMDocument solution (native PHP), which will also work when your p
tags have attributes, or contain other tags like <br>
, or have lots of white-space in between them (which is irrelevant in HTML rendering), or contain HTML entities like
or <
, etc, etc:
$html_string = "<p>Some content</p><p>separated by</p><p>paragraphs</p>";
$doc = new DOMDocument();
$doc->loadHTML($html_string);
foreach($doc->getElementsByTagName('p') as $p ) {
$paras[] = $p->textContent;
}
// Output array:
print_r($paras);
If you really want to stick with regular expressions, then at least allow tag attributes and HTML entities, translating the latter to their corresponding characters:
$html_string = "<p>Some content & text</p><p>separated by</p><p style='background:yellow'>paragraphs</p>";
preg_match_all('/<p(?:\s.*?)?>\s*(.*?)\s*<\/p\s*>/si', $html_string, $matches);
$paras = $matches[1];
array_walk($paras, 'html_entity_decode');
print_r($paras);
Upvotes: 0
Reputation: 2201
If its only that simple with no/not much other tags inside the content you can simply use regex for that:
$string = '<p>Some content</p><p>separated by</p><p>paragraphs</p>';
preg_match_all('/<p>([^<]*?)<\/p>/mi', $string, $matches);
var_dump($matches[1]);
which creates this output:
array(3) {
[0]=>
string(12) "Some content"
[1]=>
string(12) "separated by"
[2]=>
string(10) "paragraphs"
}
Keep in mind that this is not the most effective way nor is it the fastest, but its shorter then using DOMDocument or anything like that.
Upvotes: 1
Reputation: 548
If you need to do some html parsing in php, there is a nice library for that, called php html parser
.
https://github.com/paquettg/php-html-parser
which can give you a jquery like api, to parse html.
an example:
// Assuming you installed from Composer:
require "vendor/autoload.php";
use PHPHtmlParser\Dom;
$dom = new Dom;
$dom->load('<p>Some content</p><p>separated by</p><p>paragraphs</p>');
$pTags = $dom->find('p');
foreach ($pTags as $tag)
{
// do something with the html
$content = $tag->innerHtml;
}
Upvotes: 0