Reputation: 3103
I need to check whether a set of keys exist in S3, for each of a large number of items. (Each set of keys relates to one of the large number of items).
I am using the PHP SDK (v2)
Currently I am calling $client->doesObjectExist(BUCKET, $key)
for each of the keys, which is a bottleneck (the round-trip time to S3 for each call).
I would prefer to do something like $client->doesObjectExist(BUCKET, $batch)
where $batch = array($key1, $key2 ... $keyn)
, and for the client to check all of those keys then come back with an array of responses (or some other similar structure).
I have come across a few references to a "batch api" which sounds promising, but nothing concrete. I'm guessing that this might have been present only in the v1 SDK.
Upvotes: 4
Views: 5596
Reputation: 2938
I am building on Jeremy Lindblom's answer.
Just want to point out the OnComplete
callback that you can setup on each command.
$bucket = 'my-bucket';
$keys = array('page1.txt', 'page2.txt');
$commands = array();
foreach ($keys as $key) {
$commands[] = $s3Client->getCommand('HeadObject', array('Bucket' => $bucket, 'Key' => $key))
->setOnComplete(
function($command) use ($bucket, $key)
{
echo "\nBucket: $bucket\n";
echo "\nKey: $key\n";
// see http://goo.gl/pIWoYr for more detail on command objects
var_dump($command->getResult());
}
);
}
try {
$ex_commands = $s3Client->execute($commands);
}
catch (\Guzzle\Service\Exception\CommandTransferException $e) {
$ex_commands = $e->getAllCommands();
}
// this is necesary; without this, the OnComplete handlers wouldn't get called (strange?!?)
foreach ($ex_commands as $command)
{
$command->getResult();
}
It will be wonderful if someone could shed light on why I need to call $command->getResult()
to invoke the OnComplete
handler.
Upvotes: 0
Reputation: 6527
You can do parallel requests using the AWS SDK for PHP by taking advantage of the underlying Guzzle library features. Since the doesObjectExist
method actually does HeadObject
operations under that hood. You can create groups of HeadObject commands by doing something like this:
use Aws\S3\S3Client;
use Guzzle\Service\Exception\CommandTransferException;
function doObjectsExist(S3Client $s3, $bucket, array $objectKeys)
{
$headObjectCommands = array();
foreach ($objectKeys as $key) {
$headObjectCommands[] = $s3->getCommand('HeadObject', array(
'Bucket' => $bucket,
'Key' => $key
));
}
try {
$s3->execute($headObjectCommands); // Executes in parallel
return true;
} catch (CommandTransferException $e) {
return false;
}
}
$s3 = S3Client::factory(array(
'key' => 'your_aws_access_key_id',
'bucket' => 'your_aws_secret_key',
));
$bucket = 'your_bucket_name';
$objectKeys = array('object_key_1', 'object_key_2','object_key_3');
// Returns true only if ALL of the objects exist
echo doObjectsExist($s3, $bucket, $objectKeys) ? 'YES' : 'NO';
If you want data from the responses, other than just whether or not the keys exist, you can change the try-catch block to do something like this instead.
try {
$executedCommands = $s3->execute($headObjectCommands);
} catch (CommandTransferException $e) {
$executedCommands = $e->getAllCommands();
}
// Do stuff with the command objects
foreach ($executedCommands as $command) {
$exists = $command->getResponse()->isSuccessful() ? "YES" : "NO";
echo "{$command['Bucket']}/{$command['Key']}: {$exists}\n";
}
Sending commands in parallel is mentioned in the AWS SDK for PHP User Guide, but I would also take a look at the Guzzle batching documentation.
Upvotes: 6
Reputation: 13679
The only way to do a bulk check to see if some keys exist would be to list the objects in the bucket.
For a list call AWS returns up to 1000 keys/call so it's much faster than doing a doesObjectExist
call for each key. But if you have a large number of keys and you only want to check a couple of them, listing all the objects in the bucket will not be practical so in that case, your only option remains to check each object individually.
The problem is not that the PHP v2 SDK lacks the bulk functionality but that the S3 API does not implement such bulk processing.
Upvotes: 1