Reputation: 1
I need a function that will be able to search the $get_webpage variable to see if it contains my sites link code ($linktext). The function should be able to search the whole webpage for $linktext, which should only be placed after <body>
and before </body>
tag.
Thanks for all your help.
[[UPDATE]] Hi guys, quick update, let me clarify the link code on the example.com webpage which contains rel="nofollow" should not work, example:
<a href="mysite.com/"; rel="nofollow"><strong>My Site</strong></a>
$cc = new cURL();
$get_webpage=$cc->get('http://www.example.com');
$linktext='<a href="http://www.mysite.com/"><strong>My Site</strong></a>';
//####################################################################
//GET URL FUNCTION
//####################################################################
class cURL {
var $headers;
var $user_agent;
var $compression;
var $cookie_file;
var $proxy;
function cURL($cookies=TRUE,$cookie='cookie.txt',$compression='gzip',$proxy='') {
$this->headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';
$this->headers[] = 'Connection: Keep-Alive';
$this->headers[] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8';
$this->user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)';
$this->compression=$compression;
$this->proxy=$proxy;
$this->cookies=$cookies;
if ($this->cookies == TRUE) $this->cookie($cookie);
}
function cookie($cookie_file) {
if (file_exists($cookie_file)) {
$this->cookie_file=$cookie_file;
} else {
fopen($cookie_file,'w') or $this->error('The cookie file could not be opened. Make sure this directory has the correct permissions');
$this->cookie_file=$cookie_file;
fclose($this->cookie_file);
}
}
function get($url) {
$process = curl_init($url);
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers);
curl_setopt($process, CURLOPT_HEADER, 0);
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent);
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file);
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file);
curl_setopt($process,CURLOPT_ENCODING , $this->compression);
curl_setopt($process, CURLOPT_TIMEOUT, 30);
if ($this->proxy) curl_setopt($process, CURLOPT_PROXY, $this->proxy);
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($process, CURLOPT_MAXREDIRS, 2);
$return = curl_exec($process);
curl_close($process);
return $return;
}
function post($url,$data) {
$process = curl_init($url);
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers);
curl_setopt($process, CURLOPT_HEADER, 1);
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent);
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file);
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file);
curl_setopt($process, CURLOPT_ENCODING , $this->compression);
curl_setopt($process, CURLOPT_TIMEOUT, 30);
if ($this->proxy) curl_setopt($process, CURLOPT_PROXY, $this->proxy);
curl_setopt($process, CURLOPT_POSTFIELDS, $data);
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($process, CURLOPT_MAXREDIRS, 2);
curl_setopt($process, CURLOPT_POST, 1);
$return = curl_exec($process);
curl_close($process);
return $return;
}
function error($error) {
$fp = fopen("error.txt","w") or die ();
$error_text="cURL Error:$error\n";
fputs($fp,$error_text);
fclose($fp) or die ();
die;
}
}
//######################################################################
//END URL FUNCTION
//#######################################################################
Upvotes: 0
Views: 540
Reputation: 19380
I didn't knew that anchors can be outside body tag :)
First extract inner HTML of body tags, with preg_match... you can then use regular strpos for searching if you know exactly what link looks like in HTML.
Upvotes: 0
Reputation: 60413
The following will do it all with xpath but assumes that you want the qualification that My Site
must be within a strong
tag:
function findLinks($html, $href, $text)
{
$dom = new SimpleXmlDocument($html);
$links = $dom->xpath("//a[@href='$url']/strong[contains(., '$text')]");
if(count($links) > 0)
{
return true;
}
return false;
}
If you dont care about the strong tag you could use an xpath like:
//a[@href='$url'][contains(., '$text')]
Do some research on XPath to see whats possible. You could ofcourse jsut use a simple XPath to get all the a
tags and then loop over them looking for your qualifiers as another poster suggested.
Upvotes: 0
Reputation: 28165
There are 4 ways to do this (that I know)
I suggest the first two, perhaps DOM more than XML. See Byron's example, it ought to do the trick.
Upvotes: 0
Reputation: 53921
You can use the dom handling functions
$dom = new DOMDocument();
@$dom->loadHTML($html);
$x = new DOMXPath($dom);
foreach($x->query("//a") as $node)
{
if ($node->getAttribute("href") == "http://mysite.com")
{
// we got the link via href
}
if ($node->textContent == "http://mysite.com")
{
// we got the link via text
}
}
Upvotes: 1