colinreedy674
colinreedy674

Reputation: 365

Duplicate detection code not working

I have a fairly simple piece of code here, i just add a bunch of links in the database, then check each link for a 200 ok.

<?php
function check_alive($url, $timeout = 10) {
      $ch = curl_init($url);
      // Set request options
      curl_setopt_array($ch, array(
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_NOBODY => true,
        CURLOPT_TIMEOUT => $timeout,
        CURLOPT_USERAGENT => "page-check/1.0" 
      ));
      // Execute request
      curl_exec($ch);
      // Check if an error occurred
      if(curl_errno($ch)) {
        curl_close($ch);
        return false;
      }
      // Get HTTP response code
      $code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
      curl_close($ch);
      // Page is alive if 200 OK is received
      return $code === 200;

}

if (isset($_GET['cron'])) {
    // database connection
    $c = mysqli_connect("localhost", "paydayci_gsa", "", "paydayci_gsa");   
    //$files = scandir('Links/');
    $files = glob("Links/*.{*}", GLOB_BRACE);
    foreach($files as $file) 
    {
        $json = file_get_contents($file);
        $data = json_decode($json, true);       
        if(!is_array($data)) continue;
        foreach ($data as $platform => $urls)
        {               
            foreach($urls as $link)
            {
                //echo $link;
                $lnk = parse_url($link);
                $resUnique = $c->query("SELECT * FROM `links_to_check` WHERE `link_url` like '%".$lnk['host']."%'");
                // If no duplicate insert in database
                if(!$resUnique->num_rows)
                {
                    $i = $c->query("INSERT INTO `links_to_check` (link_id,link_url,link_platform) VALUES ('','".$link."','".$platform."')");

                }
            }
        }
        // at the very end delete the file
        unlink($file);
    }
    // check if the urls are alive
    $select = $c->query("SELECT * FROM `links_to_check` ORDER BY `link_id` ASC");
    while($row = $select->fetch_array()){   
        $alive = check_alive($row['link_url']);
        $live = "";
        if ($alive == true) 
        {
            $live = "Y";
            $lnk = parse_url($row['link_url']);
            // Check for duplicate
            $resUnique = $c->query("SELECT * FROM `links` WHERE `link_url` like '%".$row['link_url']."%'");
            echo $resUnique;
            // If no duplicate insert in database
            if(!$resUnique->num_rows)
            {
                $i = $c->query("INSERT INTO links (link_id,link_url,link_platform,link_active,link_date) VALUES ('','".$row['link_url']."','".$row['link_platform']."','".$live."',NOW())");
            }       
        }   
        $c->query("DELETE FROM `links_to_check` WHERE link_id = '".$row['link_id']."'");
    }
} 
?>

I'm trying not to add duplicate urls to the database but they are still getting in, have i missed something obvious with my code can anyone see? i have looked over it a few times, i can't see anything staring out at me.

Upvotes: 0

Views: 34

Answers (1)

Jeremy Harris
Jeremy Harris

Reputation: 24579

If you are trying to enforce unique values in a database, you should be relying on the database itself to enforce that constraint. You can add an index (assuming you are using MySQL or a variant, which the syntax appears to be) like this:

ALTER TABLE `links` ADD UNIQUE INDEX `idx_link_url` (`link_url`);

One thing to be aware of is extra spaces as prefixes/suffixes so use trim() on the values and also, you should strip trailing slashes to keep everything consistent (so you don't get dupes) using rtrim().

Upvotes: 2

Related Questions