AndrewPP
AndrewPP

Reputation: 11

how to remove link from simple dom html data

I have this code, i get the info but with this i get the data + the link for example

require_once('simple_html_dom.php');
set_time_limit (0);

$html ='www.domain.com';
    $html = file_get_html($url);
        // i read the first div
    foreach($html->find('#content') as $element){
     // i read the second
        foreach ($element->find('p')  as $phone){
            echo $phone;

Mobile Pixel 2 - google << there the link

But i need remove these link, the problem is the next, i scrape this:

<p>the info that i really need is here<p>
     <p class="text-right"><a class="btn btn-default espbott aplus" role="button"
      href="brand/google.html">Google</a></p>

I read this: Simple HTML Dom: How to remove elements? But i cant find the answer

update: if i use this:

foreach ($element->find('p[class="text-right"]');

It will select the links but can't remove scrapped data

Upvotes: 1

Views: 341

Answers (2)

SirPilan
SirPilan

Reputation: 4837

Or here a native version:

PHP-CODE

$sHtml = '<p>the info that i really need is here<p>
 <p class="text-right"><a class="btn btn-default espbott aplus" role="button"
  href="brand/google.html">Google</a></p>';

$sHtml = '<div id="wrapper">' . $sHtml . '</div>';
echo "org:\n";
echo $sHtml;

echo "\n\n";

$doc = new DOMDocument();
$doc->loadHtml($sHtml);

foreach( $doc->getElementsByTagName( 'a' ) as $element ) {
    $element->parentNode->removeChild( $element );
}

echo "res:\n";
echo $doc->saveHTML($doc->getElementById('wrapper'));

Output

org:
<div id="wrapper"><p>the info that i really need is here<p>
     <p class="text-right"><a class="btn btn-default espbott aplus" role="button"
      href="brand/google.html">Google</a></p></div>

res:
<div id="wrapper">
<p>the info that i really need is here</p>
<p>
     </p>
<p class="text-right"></p>
</div>

https://3v4l.org/RhuEU

Upvotes: 0

HamzaNig
HamzaNig

Reputation: 1029

You can use file_get_content with str_get_html and replace it :

include 'simple_html_dom.php';

$content=file_get_contents($url);

      $html = str_get_html($content);
    // i read the first div
foreach($html->find('#content') as $element){
 // i read the second
    foreach ($element->find('p[class="text-right"]')  as $phone){
        $content=str_replace($phone,'',$content);
                                                                }                                           
                                            }
print $content;
die;

Upvotes: 1

Related Questions