John Smith
John Smith

Reputation: 117

Simple HTML DOM, how to echo only the text from anchor text

summary of my code:

foreach($html->find('a') as $element) {

.. I use for inner text this:

$element->innertext

It is any chance to echo only the text from anchor text unsing Simple HTML DOM, i try to crawl about 10k links but in some cases it prints IF is inside <a tag ,divs code, images code, etc.

if the <a tag is standard(simple) like:

<a href="http://www.test.com">Anchor Text</a>

so in this case $element->innertext will be "Anchor Text"

BUT

if the cases is like this:

1    <a href="http://www.test.com"><div id=whatever>Anchor Text</div></a>

or

2    <a href="http://www.test.com"><img src="whatever" /></a>

my $element->innertext will be:

Result1 <div id=whatever>Anchor Text</div>
Result2 <img src="whatever" />

Is there any change to print ONLY the text or should i write my own custom conditions for each case: div, img, etc?

Upvotes: 3

Views: 2861

Answers (3)

EngineerCoder
EngineerCoder

Reputation: 1455

$mbHtml = mb_convert_encoding($element->innertext, 'HTML-ENTITIES', 'utf-8');
$mbHtml = mb_eregi_replace('<(div|option|ul|li|table|tr|td|th|input|select|textarea|form)', ' <\\1', $mbHtml );

Upvotes: 0

TecBrat
TecBrat

Reputation: 3729

It's as simple as strip_tags($element->innertext);

The result will be an empty string if the anchor is an image.

Upvotes: 4

Indra Kumar S
Indra Kumar S

Reputation: 2945

Use Plaintext

     strip_tags($element->plaintext)

Upvotes: 2

Related Questions