Reputation: 3608
I have some fairly large paragraphs (5000-6000 words) containing text and embedded html tags. I want to break this large paragraph in chunks of 1500 words (ignoring the html markup in it) i.e 1500 should include only actual words and not any markup words. Using function strip_tags
i can count the number of words (ignoring the html markup), but i'm not able to figure out how to break it in chunks of 1500 words (still including html markup). For example
This is <b> a </b> paragraph which <a href="#"> has some </a> some text to be broken in <h1> 5 words </h1>.
The result should be
1 = This is <b> a </b> paragraph which
2 = <a href="#"> has some </a> some text to
3 = be broken in <h1> 5 words </h1>.
Upvotes: 4
Views: 238
Reputation: 10502
Use an XML DOM Parser or an HTML DOM Parser.
words
exceeds N
Upvotes: 0
Reputation: 11352
I think you're going to need to parse your html if you want to guarantee valid markup. In which case this question should provide a really useful starting point.
Upvotes: 1
Reputation: 1052
Think about using explode() function wisely. Or better, but longer - regular expression that will match either a word or a tag with all text within it. You should consider elements inside html tags as unbreakable entity. For example, you can write a function, that breaks you large paragraph into following array of entities:
$data = array(
array( "count" => 2, "text" => "This is "),
array( "count" => 1, "text" => "<b> a </b>"),
array( "count" => 2, "text" => " paragraph which"),
...
etc.
);
Then, you should write a loop, that will make small paragraphs from $data array.
Also, sometimes it won't be possible to make your paragraph exactly 1500 words long. It can be more or less, because you should not separate you html tags.
Upvotes: 2