Reputation:

I need to split text delimited by paragraph tag

$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";

I need to split the above into an array delimited by the paragraph tags. That is, I need to split the above into an array with two elements:

array ([0] = "this is the first paragraph", [1] = "this is the first paragraph")

Upvotes: 9

Answers (8)

Michal

Reputation: 5220

This is an old question but I was not able to find any reasonable solution in an hour of looking for stactverflow answers. If you have string full of html tags (p tags) and if you want to get paragraphs (or first paragraph) use DOMDocument.

$long_description is a string that has  tags in it.

$long_descriptionDOM = new DOMDocument();
// This is how you use it with UTF-8
$long_descriptionDOM->loadHTML((mb_convert_encoding($long_description, 'HTML-ENTITIES', 'UTF-8')));
$paragraphs = $long_descriptionDOM->getElementsByTagName('p');
$first_paragraph = $paragraphs->item(0)->textContent();

I guess that this is the right solution. No need for regex.

edit: YOU SHOULD NOT USE REGEX TO PARSE HTML.

Upvotes: 3

Ukuser32

Reputation: 2189

For anyone else who finds this, don't forget that a P tag may have styles, id's or any other possible attributes so you should probably look at something like this:

$ps = preg_split('#<p([^>])*>#',$input);

Upvotes: 11

bigN

Reputation: 21

$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";

$exptext = explode("<p>", $text);

echo $exptext[0];
echo "<br>";
echo $exptext[1];

//////////////// OUTPUT /////////////////

this is the first paragraph
this is the first paragraph

Upvotes: 1

Treffynnon

Reputation: 21553

Remove the closing  tags as we don't need them and then explode the string into an array on opening  tags.

$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
$text = str_replace('</p>', '', $text);
$array = explode('<p>', $text);

To see the code run please see the following codepad entry. As you can see this code will leave you with an empty array entry at index 0. If this is a problem then it can easily be removed by calling array_shift($array) before using the array.

Upvotes: 25

Prasad Rajapaksha

Reputation: 6190

Try this.

<?php
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
$array = json_decode(json_encode((array) simplexml_load_string('<data>'.$text.'</data>')),1);
print_r($array['p']);
?>

Upvotes: 0

Jaffa

Reputation: 12709

Try the following:

<?php
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";

$array;

preg_replace_callback("`<p>(.+)</p>`isU", function ($matches) {
    global $array;
    $array[] = $matches[1];
}, $text);

var_dump($array);

?>

This can be modified, putting the array in a class that manage it with an add value method, and a getter.

Upvotes: 0

mario

Reputation: 145482

If your input is somewhat consistent you can use a simple split method as:

 $paragraphs = preg_split('~(</?p>\s*)+~', $text, PREG_SPLIT_NO_EMPTY);

Where the preg_split will look for combinations of  and  plus possible whitespace and separate the string there.

As unnecessary alternative you can also use querypath or phpquery to extract only complete paragraph contents using:

 foreach (htmlqp($text)->find("p") as $p) { print $p->text(); }

Upvotes: 0

Ocracoke

Reputation: 1768

Try this code:

<?php
$textArray = explode("<p>" $text);

for ($i = 0; $i < sizeof($textArray); $i++) {
    $textArray[$i] = strip_tags($textArray[$i]);
}

Upvotes: 0

I need to split text delimited by paragraph tag

Answers (8)

Related Questions