Reputation: 117
summary of my code:
foreach($html->find('a') as $element) {
.. I use for inner text this:
$element->innertext
It is any chance to echo only the text from anchor text unsing Simple HTML DOM, i try to crawl about 10k links but in some cases it prints IF is inside <a tag
,divs code, images code, etc.
if the <a tag
is standard(simple) like:
<a href="http://www.test.com">Anchor Text</a>
so in this case $element->innertext will be "Anchor Text"
BUT
if the cases is like this:
1 <a href="http://www.test.com"><div id=whatever>Anchor Text</div></a>
or
2 <a href="http://www.test.com"><img src="whatever" /></a>
my $element->innertext
will be:
Result1 <div id=whatever>Anchor Text</div>
Result2 <img src="whatever" />
Is there any change to print ONLY the text or should i write my own custom conditions for each case: div, img, etc?
Upvotes: 3
Views: 2861
Reputation: 1455
$mbHtml = mb_convert_encoding($element->innertext, 'HTML-ENTITIES', 'utf-8');
$mbHtml = mb_eregi_replace('<(div|option|ul|li|table|tr|td|th|input|select|textarea|form)', ' <\\1', $mbHtml );
Upvotes: 0
Reputation: 3729
It's as simple as strip_tags($element->innertext);
The result will be an empty string if the anchor is an image.
Upvotes: 4