Yefb
Yefb

Reputation: 146

Amazon AWSSDKforPHP too slow

Amazon AWSSDKforPHP too slow

Hi there,

I'm using Amazon AWSSDKforPHP for connecting my web application with S3. But, there's an issue with the process or making requests to the service that make this too slow.

For example, I have this code:

// Iterate an array of user images
foreach($images as $image){
    // Return the Bucket URL for this image
    $urls[] = $s3->get_object_url($bucket, 'users/'.trim($image).'.jpg', '5 minutes');
}

Supposing that $images is an array of user pictures, this returns an array called $urls that have (As his name says) the URL of tha pictures with the credentials for 5 minutes. This request takes at least 6 seconds with 35 images, and that's ok. But.... when the pictures does not exists in the bucket, I want to assign a default image for the user, something like 'images/noimage.png'. Here's the code:

// Iterate an array of user images
foreach($images as $image){

    // Check if the object exists in the Bucket
    if($s3->if_object_exists($bucket, 'users/'.trim($image).'.jpg')){
        // Return the Bucket URL for this image
        $urls[] = $s3->get_object_url($bucket, 'users/'.trim($image).'.jpg', '5 minutes');
    } else { 

        // Return the default image
        $urls[] = 'http://www.example.com/images/noimage.png';
    }

}

And the condition works, but SLOOOOOW. With the the condition "$s3->if_object_exists()", the Script takes at least 40 seconds with 35 images!

I have modified my Script, making the request using cURL:

// Iterate an array of user images
foreach($images as $image){

    // Setup cURL
    $ch = curl_init($s3->get_object_url($bucket, 'users/'.trim($image).'.jpg', '1 minutes') );
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $response = curl_exec($ch);
    // Get Just the HTTP response code
    $res = curl_getinfo($ch,CURLINFO_HTTP_CODE);

    if($res == 200){ //the image exists
        $urls[] = $s3->get_object_url($bucket, 'users/'.trim($image).'.jpg', '5 minutes');
    }else{ // The response is 403
        $urls[] = 'http://www.example.com/images/noimage.png';
    }
}

And this modified Script takes between 16 and 18 Seconds. This is a big difference, but it's still a lot of time :(.

Please, any help is so much appreciated.

Thank you.

Upvotes: 5

Views: 1937

Answers (3)

Ryan Parman
Ryan Parman

Reputation: 6945

It's slow because you're calling if_object_exists() in every iteration through the loop, kicking off a network request to AWS.

The user "thatidiotguy" said:

I do not know about the S3 API, but could you ask for a list of files in the bucket and do the string matching/searching yourself in the script? There is no way 34 string match tests should take anywhere near that long in a PHP script.

He's right.

Instead of calling if_object_exists(), you can instead call get_object_list() once — at the beginning of the script — then compare your user photo URL to the list using PHP's in_array() function.

You should see a speed-up of approximately a zillion percent. Don't quote me on that, though. ;)

Upvotes: 1

Mike Brant
Mike Brant

Reputation: 71404

I would think that if you wanted to be able to read directory type of information from S3, you might best use something like s3fs to mount your bucket as a system drive. s3fs can also be configured with a local cache to speed things up (cache on fast ephemeral storage if you are using EC2).

This would allow you to do regular PHP directory handling (DirectoryIterator, etc.) with ease.

If this is more than you want to mess with, at least store the filename data in a databases and just expect the files to be in proper S3 locations or cache the results of individual API checks locally in some manner so as to not need to make an API call for each similar request.

Upvotes: 1

sean
sean

Reputation: 3985

Why not change how you are doing your checks. Store the locations/buckets of the images locally in a database then this way you do not have to worry about this check?

This way you minimize the number of API calls you are doing which is 35 in your case now, but this could get exponentially large with time. And, not only are you doing one call per image but rather two calls per image for the most part. This is highly inefficient and reliant on your network connection to be fairly fast.

Moving the location data and if the image exists or not locally is a much better choice in terms of performance in this area. Also this check should only have to be done a single time it looks like anyways if you store the result ahead of time.

Upvotes: 1

Related Questions