Reputation: 11049
I would like to extract blocks of texts with more than 100 words from a large HTML page using PHP. Whether the text is contained in <p>...</p>
doesn't matter. I only care about the number of words that makes a coherent text block so texts outside of HTML paragraphs should also be taken into consideration.
How can this be done?
Upvotes: 0
Views: 1582
Reputation: 15358
I use phpQuery. Are you familiar with jQuery? they share the same syntax. You might be concerned about installing a new library, but trust me this library is well worth the extra over head
You can then access it like this:
foreach($doc->find('p') as $element){
$element = pq($element);
echo str_word_count($element->text());
}
Upvotes: 5
Reputation: 33881
Use the PHP Simple DOM Parser.
foreach($html->find('p') as $element){
echo str_word_count($element->src);
}
Upvotes: 2