user813077
user813077

Reputation:

I need to split text delimited by paragraph tag

$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";

I need to split the above into an array delimited by the paragraph tags. That is, I need to split the above into an array with two elements:

array ([0] = "this is the first paragraph", [1] = "this is the first paragraph")

Upvotes: 9

Views: 21125

Answers (8)

Michal
Michal

Reputation: 5220

This is an old question but I was not able to find any reasonable solution in an hour of looking for stactverflow answers. If you have string full of html tags (p tags) and if you want to get paragraphs (or first paragraph) use DOMDocument.

$long_description is a string that has <p> tags in it.

$long_descriptionDOM = new DOMDocument();
// This is how you use it with UTF-8
$long_descriptionDOM->loadHTML((mb_convert_encoding($long_description, 'HTML-ENTITIES', 'UTF-8')));
$paragraphs = $long_descriptionDOM->getElementsByTagName('p');
$first_paragraph = $paragraphs->item(0)->textContent();

I guess that this is the right solution. No need for regex.

edit: YOU SHOULD NOT USE REGEX TO PARSE HTML.

Upvotes: 3

Ukuser32
Ukuser32

Reputation: 2189

For anyone else who finds this, don't forget that a P tag may have styles, id's or any other possible attributes so you should probably look at something like this:

$ps = preg_split('#<p([^>])*>#',$input);

Upvotes: 11

bigN
bigN

Reputation: 21

$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";

$exptext = explode("<p>", $text);

echo $exptext[0];
echo "<br>";
echo $exptext[1];

//////////////// OUTPUT /////////////////

this is the first paragraph
this is the first paragraph

Upvotes: 1

Treffynnon
Treffynnon

Reputation: 21553

Remove the closing </p> tags as we don't need them and then explode the string into an array on opening </p> tags.

$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
$text = str_replace('</p>', '', $text);
$array = explode('<p>', $text);

To see the code run please see the following codepad entry. As you can see this code will leave you with an empty array entry at index 0. If this is a problem then it can easily be removed by calling array_shift($array) before using the array.

Upvotes: 25

Prasad Rajapaksha
Prasad Rajapaksha

Reputation: 6190

Try this.

<?php
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
$array = json_decode(json_encode((array) simplexml_load_string('<data>'.$text.'</data>')),1);
print_r($array['p']);
?>

Upvotes: 0

Jaffa
Jaffa

Reputation: 12709

Try the following:

<?php
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";

$array;

preg_replace_callback("`<p>(.+)</p>`isU", function ($matches) {
    global $array;
    $array[] = $matches[1];
}, $text);

var_dump($array);

?>

This can be modified, putting the array in a class that manage it with an add value method, and a getter.

Upvotes: 0

mario
mario

Reputation: 145482

If your input is somewhat consistent you can use a simple split method as:

 $paragraphs = preg_split('~(</?p>\s*)+~', $text, PREG_SPLIT_NO_EMPTY);

Where the preg_split will look for combinations of <p> and </p> plus possible whitespace and separate the string there.

As unnecessary alternative you can also use or to extract only complete paragraph contents using:

 foreach (htmlqp($text)->find("p") as $p) { print $p->text(); }

Upvotes: 0

Ocracoke
Ocracoke

Reputation: 1768

Try this code:

<?php
$textArray = explode("<p>" $text);

for ($i = 0; $i < sizeof($textArray); $i++) {
    $textArray[$i] = strip_tags($textArray[$i]);
}

Upvotes: 0

Related Questions