Reputation:
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
I need to split the above into an array delimited by the paragraph tags. That is, I need to split the above into an array with two elements:
array ([0] = "this is the first paragraph", [1] = "this is the first paragraph")
Upvotes: 9
Views: 21125
Reputation: 5220
This is an old question but I was not able to find any reasonable solution in an hour of looking for stactverflow answers. If you have string full of html tags (p tags) and if you want to get paragraphs (or first paragraph) use DOMDocument
.
$long_description
is a string that has <p>
tags in it.
$long_descriptionDOM = new DOMDocument();
// This is how you use it with UTF-8
$long_descriptionDOM->loadHTML((mb_convert_encoding($long_description, 'HTML-ENTITIES', 'UTF-8')));
$paragraphs = $long_descriptionDOM->getElementsByTagName('p');
$first_paragraph = $paragraphs->item(0)->textContent();
I guess that this is the right solution. No need for regex.
edit: YOU SHOULD NOT USE REGEX TO PARSE HTML.
Upvotes: 3
Reputation: 2189
For anyone else who finds this, don't forget that a P tag may have styles, id's or any other possible attributes so you should probably look at something like this:
$ps = preg_split('#<p([^>])*>#',$input);
Upvotes: 11
Reputation: 21
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
$exptext = explode("<p>", $text);
echo $exptext[0];
echo "<br>";
echo $exptext[1];
//////////////// OUTPUT /////////////////
this is the first paragraph
this is the first paragraph
Upvotes: 1
Reputation: 21553
Remove the closing </p>
tags as we don't need them and then explode the string into an array on opening </p>
tags.
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
$text = str_replace('</p>', '', $text);
$array = explode('<p>', $text);
To see the code run please see the following codepad entry. As you can see this code will leave you with an empty array entry at index 0. If this is a problem then it can easily be removed by calling array_shift($array)
before using the array.
Upvotes: 25
Reputation: 6190
Try this.
<?php
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
$array = json_decode(json_encode((array) simplexml_load_string('<data>'.$text.'</data>')),1);
print_r($array['p']);
?>
Upvotes: 0
Reputation: 12709
Try the following:
<?php
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
$array;
preg_replace_callback("`<p>(.+)</p>`isU", function ($matches) {
global $array;
$array[] = $matches[1];
}, $text);
var_dump($array);
?>
This can be modified, putting the array in a class that manage it with an add value method, and a getter.
Upvotes: 0
Reputation: 145482
If your input is somewhat consistent you can use a simple split method as:
$paragraphs = preg_split('~(</?p>\s*)+~', $text, PREG_SPLIT_NO_EMPTY);
Where the preg_split
will look for combinations of <p>
and </p>
plus possible whitespace and separate the string there.
As unnecessary alternative you can also use querypath or phpquery to extract only complete paragraph contents using:
foreach (htmlqp($text)->find("p") as $p) { print $p->text(); }
Upvotes: 0
Reputation: 1768
Try this code:
<?php
$textArray = explode("<p>" $text);
for ($i = 0; $i < sizeof($textArray); $i++) {
$textArray[$i] = strip_tags($textArray[$i]);
}
Upvotes: 0