user586011
user586011

Reputation: 1968

How To Insert Links Scraped With DOM Into A MySQL Database? (or what am I doing wrong?)

I am putting together a php script that pulls html using curl, copies it into new pages and saves the page names. All that works, but I also want to collect the urls on the page and enter them into a database. From my research, it looks like DOM is the best way to do that. However I get "Error, insert query failed" when I include DOM in my code. Here is where I am getting the DOM code. I suspect this is a database issue.

DOM, PHP and MySQL are new to me, so any comments, pointers or suggestions would be helpful and appreciated.

Any comments on the overall approach, or suggestions of alternative, are also quite welcome. I am not entirely convinced that DOM is best for scraping urls from html.

<html>
<body>

<?
$urls=explode("\n", $_POST['url']);
$proxies=explode("\n", $_POST['proxy']);

for ( $counter = 0; $counter <= 6; $counter++) {
for ( $count = 0; $count <= 6; $count++) {

 $ch = curl_init();
 curl_setopt($ch, CURLOPT_URL,$urls[$counter]);
 curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 0);
 curl_setopt($ch, CURLOPT_PROXY,$proxies[$count]);
 curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 curl_setopt($ch, CURLOPT_CUSTOMREQUEST,'GET');
 curl_setopt ($ch, CURLOPT_HEADER, 1); 
curl_exec ($ch); 
$curl_scraped_page = curl_exec($ch); 

$FileName = rand(0,100000000000);
$FileHandle = fopen($FileName, 'w') or die("can't open file");
fwrite($FileHandle, $curl_scraped_page);


$dom = new DOMDocument();
@$dom->loadHTML($curl_scraped_page);
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");

$hostname="****";
$username="****";
$password="****";
$dbname="leadturtle";
$usertable="happyturtle";

$con=mysql_connect($hostname,$username, $password) or die ("<html><script language='JavaScript'>alert('Unable to connect to database! Please try again later.'),history.go(-1)</script></html>");
mysql_select_db($dbname ,$con);



function storeLink($url) {
    $query = "INSERT INTO happyturtle (time, ad1, ad2) VALUES ('$FileName','$url', '$gathered_from')";
    mysql_query($query) or die('Error, insert query failed');
}
for ($i = 0; $i < $hrefs->length; $i++) {
    $href = $hrefs->item($i);
    $url = $href->getAttribute('href');
    storeLink($url,$target_url);

}


mysql_close($con);

fclose($FileHandle);

curl_close($ch);

echo $FileName; 

echo "<br/>";

}
}

?>

</body>
</html>

Upvotes: 2

Views: 468

Answers (1)

VGE
VGE

Reputation: 4191

You are not escaping the values in your SQL query.

If your strings parameters contain a ' it'll will lead to syntax error (best case). But it can also lead to source injection and big security hole (http://xkcd.com/327/ :)!

First check your input.

Please add hte error message in your question.

Upvotes: 2

Related Questions