Anthony
Anthony

Reputation: 35928

How to access an item from S3 using boto3 and read() its contents

I have a method that fetches a file from a URL and converts it to OpenCV image

def my_method(self, imgurl):
   req = urllib.urlopen(imgurl)
   r = req.read()
   arr = np.asarray(bytearray(r), dtype=np.uint8)
   image = cv2.imdecode(arr,-1) # 'load it as it is'
   return image

I would like to use boto3 to access an object from s3 bucket and convert it to an image just like above method does. However, I'm not sure how to access an item from a bucket using boto3 and then further how to read() contents of that item.

Below is what I've tried

>>> import botocore
>>> import boto3
>>> client = boto3.client('s3',aws_access_key_id="myaccsskey",aws_secret_access_key="secretkey")
>>> bucketname = "mybucket"
>>> itemname = "demo.png"

Questions

  1. How can I access a particular item from a bucket using boto3?
  2. Is there a way to read the contents of the accessed item similar to what I'm doing in my_method using req.read()?

Upvotes: 1

Views: 8141

Answers (2)

ciurlaro
ciurlaro

Reputation: 1014

As I explained here, the following is the fastest approach to read from an S3 file:

import io
import boto3

client = boto3.client('s3')
buffer = io.BytesIO()

# This is just an example, parameters should be fine tuned according to:
# 1. The size of the object that is being read (bigger the file, bigger the chunks)
# 2. The number of threads available on the machine that runs this code

config = TransferConfig(
    multipart_threshold=1024 * 25,   # Concurrent read only if object size > 25MB
    max_concurrency=10,              # Up to 10 concurrent readers
    multipart_chunksize=1024 * 25,   # 25MB chunks per reader
    use_threads=True                 # Must be True to enable multiple readers
)

client.download_fileobj(
    Bucket=bucket_name, 
    Key=object_key, 
    Fileobj=buffer,
    Config=config
)

body = buffer.getvalue().decode()

Upvotes: 0

stellasia
stellasia

Reputation: 5612

I would do 1 this way:

import boto3
s3 = boto3.resource('s3',
                     use_ssl=False,
                     endpoint_url="http://localhost:4567",
                     aws_access_key_id="",
                     aws_secret_access_key="",
)
obj = s3.Object(bucketname, itemname)

For 2, I have never tried by this SO answer suggest:

body = obj.get()['Body'].read()

using the high-level ressource proposed by boto3.

Upvotes: 4

Related Questions