Reputation: 91
I am outsourcing an image classification task to Amazon's Mechanical Turk. Therefore a csv file is used storing the urls of the images used for classification by the workers. The images these urls point at, need to be publicly available for the workers to have access according to the example in the Docs.
However, my data is sensitive and does not allow to be hosted publicly. Is there any chance I can use MTurk on images with restricted access?
Upvotes: 2
Views: 542
Reputation: 68
I suggest to host the images in a private S3 bucket, and generate presigned URLs with an expiration of expiration
seconds. By doing so you will allow workers on MTurk to see the HIT images (through the presigned URL), and guarantee that after expiration
seconds the URL will expire, no longer allowing anybody to access the sensitive data.
import logging
import boto3
from botocore.exceptions import ClientError
def create_presigned_url(bucket: str, key: str, expiration: int):
"""Generate a presigned URL to share an S3 object
:param bucket: name of the bucket
:param key: key of the object for which to create a presigned URL
:param expiration: Time in seconds for the presigned URL to remain valid
:return: Presigned URL as string. If error, returns None.
"""
# Generate a presigned URL for the S3 object
s3_client = boto3.client('s3')
try:
response = s3_client.generate_presigned_url(
'get_object',
Params={'Bucket': bucket, 'Key': key},
ExpiresIn=expiration
)
except ClientError as e:
logging.error(e)
return None
# The response contains the presigned URL
return response
More information on how to generate a presigned url here: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html
Upvotes: 2