Reputation: 428
I have this web crawler which works fine for the given site. After extracting the links from the site to my php page, it posts it as text with hyperlink to it.
But the problem here is, the extracted link from the site is partial. so i have to add scheme like "http://example.com/". But when i add this to the extracted link, it prints out in my php page along with some Unnecessary Apostrophes. which messes up with the link.
the code goes as::
<?php
function get_datac($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
curl_close($ch);
return $result;
}
$returned_content = get_datac('http://www.usmle.net/step-1/');
$first_step = explode( '<body' , $returned_content );
$second_step = explode('</body>', $first_step[1]);
$third_step = explode('<ul>', $second_step[0]);
// print_r($third_step);
foreach ($third_step as $key=>$element) {
$head = 'http://example.com';
$child_first = explode( '<li' , $element );
$child_second = explode( '</li>' , $child_first[1] );
$child_third = explode( '<a href=' , $child_second[0] );
$child_fourth = explode( '</a>' , $child_third[1] );
$link = $head.$child_fourth[0];
$final = "<a href=".$link."</a></br>";
?>
<li target="_blank" class="itemtitle">
<span class="item_new"></span><?php echo $final?>
</li>
<?php
}
?>
here, the link prints out as
https://example.com/"extractedlink/"
the above shown extra apostrophes are breaking the link to error.
Any help is appreciated..
Upvotes: 0
Views: 42
Reputation: 4265
This is happening because the tag will either be <a href="/link">
or <a href='/link'>
. Your code is correctly extracting the "/link"
or '/link'
parts, so they simply need removing.
This can be done using the trim()
function within PHP, such as:
$head . trim($child_fourth[0], '\'"'); // build the link
$final = "<a href=\"".$link."</a></br>"; // add the link into the $final variable
That will replace both "
and '
from each end of the $child_fourth[0]
variable so cover both cases.
Upvotes: 2