Pavel K
Pavel K

Reputation: 235

PHP/SQL - compare a lot of images

I create a script which will be magazine a lot of images into database. I want magazine only unique images so I should checking everytime if now getting image exist in my database. And this is my problem - how I can do it in short time, when in my database is ~1000000 records?

My idea is use strlen() on every image:

$image = file_get_contents('http://server.com/imageX.jpg');
$counter = strlen($image);
// $counter => for example: 105188 

Then save this number in database and use INSERT IGNORE INTO:

INSERT IGNORE INTO `database` (`unique_counter`, `img_url`, `img_name`) VALUES (105188, 'http://server.com/imageX.jpg', 'imageX.jpg')

and if this image will be added - everything is ok. But I think this idea is well for ~100 images. When I have 1000000 images and more and everything of these images have similar sizes (width and height), counter from my idea can be the same also when images will be not the same.

Can you help? How I can compare many images from my database in very short time?

Thanks.

Upvotes: 0

Views: 64

Answers (2)

4EACH
4EACH

Reputation: 2197

$info = getimagesize('http://server.com/imageX.jpg');

$info['time'] = time();// You can add microtime if needed..

$hash = base64_encode(json_encode($info));

INSERT IGNORE INTO `database` (`hash`, `img_url`, `img_name`) VALUES ($hash, 'http://server.com/imageX.jpg', 'imageX.jpg')

Upvotes: 0

Thamaraiselvam
Thamaraiselvam

Reputation: 7080

You should create a hash for that images then you can store them into the database.

you can use $hash = md5_file($file_path); to get hash for smaller files

If you have very large image and you can get hash without affecting the memory limit

function get_hash($file_path, $limit = 0, $offset = 0) {

    if (filesize($file_path) < 15728640) { //get hash for less than 15MB images
        // md5_file is always faster if we don't chunk the file
        $hash = md5_file($file_path);

        return $hash !== false ? $hash : null;
    }

    $ctx = hash_init('md5');

    if (!$ctx) {
        // Fail to initialize file hashing
        return null;
    }

    $limit = filesize($file_path) - $offset;

    $handle = @fopen($file_path, "rb");
    if ($handle === false) {
        // Failed opening file, cleanup hash context
        hash_final($ctx);

        return null;
    }

    fseek($handle, $offset);

    while ($limit > 0) {
        // Limit chunk size to either our remaining chunk or max chunk size
        $chunkSize = $limit < 131072 ? $limit : 131072;
        $limit -= $chunkSize;

        $chunk = fread($handle, $chunkSize);
        hash_update($ctx, $chunk);
    }

    fclose($handle);

    return hash_final($ctx);
}

Upvotes: 2

Related Questions