Coder
Coder

Reputation: 3103

How can I batch doesObjectExist() requests to Amazon S3?

I need to check whether a set of keys exist in S3, for each of a large number of items. (Each set of keys relates to one of the large number of items).

I am using the PHP SDK (v2)

Currently I am calling $client->doesObjectExist(BUCKET, $key) for each of the keys, which is a bottleneck (the round-trip time to S3 for each call).

I would prefer to do something like $client->doesObjectExist(BUCKET, $batch) where $batch = array($key1, $key2 ... $keyn), and for the client to check all of those keys then come back with an array of responses (or some other similar structure).

I have come across a few references to a "batch api" which sounds promising, but nothing concrete. I'm guessing that this might have been present only in the v1 SDK.

Upvotes: 4

Views: 5596

Answers (3)

ϹοδεMεδιϲ
ϹοδεMεδιϲ

Reputation: 2938

I am building on Jeremy Lindblom's answer.

Just want to point out the OnComplete callback that you can setup on each command.

$bucket = 'my-bucket';
$keys = array('page1.txt', 'page2.txt');

$commands = array();
foreach ($keys as $key) {
    $commands[] = $s3Client->getCommand('HeadObject', array('Bucket' => $bucket, 'Key' => $key))
        ->setOnComplete(
            function($command) use ($bucket, $key)
            {
                echo "\nBucket: $bucket\n";
                echo "\nKey: $key\n";

                // see http://goo.gl/pIWoYr for more detail on command objects
                var_dump($command->getResult());
            }
        );
}

try {
    $ex_commands = $s3Client->execute($commands);
}
catch (\Guzzle\Service\Exception\CommandTransferException $e) {
    $ex_commands = $e->getAllCommands();
}

// this is necesary; without this, the OnComplete handlers wouldn't get called (strange?!?)
foreach ($ex_commands as $command)
{
    $command->getResult();
}

It will be wonderful if someone could shed light on why I need to call $command->getResult() to invoke the OnComplete handler.

Upvotes: 0

Jeremy Lindblom
Jeremy Lindblom

Reputation: 6527

You can do parallel requests using the AWS SDK for PHP by taking advantage of the underlying Guzzle library features. Since the doesObjectExist method actually does HeadObject operations under that hood. You can create groups of HeadObject commands by doing something like this:

use Aws\S3\S3Client;
use Guzzle\Service\Exception\CommandTransferException;

function doObjectsExist(S3Client $s3, $bucket, array $objectKeys)
{
    $headObjectCommands = array();
    foreach ($objectKeys as $key) {
        $headObjectCommands[] = $s3->getCommand('HeadObject', array(
            'Bucket' => $bucket,
            'Key'    => $key
        ));
    }

    try {
        $s3->execute($headObjectCommands); // Executes in parallel
        return true;
    } catch (CommandTransferException $e) {
        return false;
    }
}

$s3 = S3Client::factory(array(
    'key'    => 'your_aws_access_key_id',
    'bucket' => 'your_aws_secret_key',
));
$bucket = 'your_bucket_name';
$objectKeys = array('object_key_1', 'object_key_2','object_key_3');

// Returns true only if ALL of the objects exist
echo doObjectsExist($s3, $bucket, $objectKeys) ? 'YES' : 'NO';

If you want data from the responses, other than just whether or not the keys exist, you can change the try-catch block to do something like this instead.

try {
    $executedCommands = $s3->execute($headObjectCommands);
} catch (CommandTransferException $e) {
    $executedCommands = $e->getAllCommands();
}

// Do stuff with the command objects
foreach ($executedCommands as $command) {
    $exists = $command->getResponse()->isSuccessful() ? "YES" : "NO";
    echo "{$command['Bucket']}/{$command['Key']}: {$exists}\n";
}

Sending commands in parallel is mentioned in the AWS SDK for PHP User Guide, but I would also take a look at the Guzzle batching documentation.

Upvotes: 6

dcro
dcro

Reputation: 13679

The only way to do a bulk check to see if some keys exist would be to list the objects in the bucket.

For a list call AWS returns up to 1000 keys/call so it's much faster than doing a doesObjectExist call for each key. But if you have a large number of keys and you only want to check a couple of them, listing all the objects in the bucket will not be practical so in that case, your only option remains to check each object individually.

The problem is not that the PHP v2 SDK lacks the bulk functionality but that the S3 API does not implement such bulk processing.

Upvotes: 1

Related Questions