lecodesportif
lecodesportif

Reputation: 11049

How to extract blocks of text from a HTML page?

I would like to extract blocks of texts with more than 100 words from a large HTML page using PHP. Whether the text is contained in <p>...</p> doesn't matter. I only care about the number of words that makes a coherent text block so texts outside of HTML paragraphs should also be taken into consideration.

How can this be done?

Upvotes: 0

Views: 1582

Answers (2)

Jason
Jason

Reputation: 15358

I use phpQuery. Are you familiar with jQuery? they share the same syntax. You might be concerned about installing a new library, but trust me this library is well worth the extra over head

phpQuery

You can then access it like this:

foreach($doc->find('p') as $element){
   $element = pq($element);
   echo str_word_count($element->text());
}

Upvotes: 5

fredley
fredley

Reputation: 33881

Use the PHP Simple DOM Parser.

foreach($html->find('p') as $element){
   echo str_word_count($element->src);
}

Upvotes: 2

Related Questions