qxzsilver
qxzsilver

Reputation: 655

How to access AWS S3 data using boto3

I am fairly new to both S3 as well as boto3. I am trying to read in some data in the following format:

https://blahblah.s3.amazonaws.com/data1.csv
https://blahblah.s3.amazonaws.com/data2.csv
https://blahblah.s3.amazonaws.com/data3.csv

I am importing boto3, and it seems like I would need to do something like:

import boto3
s3 = boto3.client('s3')

However, what should I do after creating this client if I want to read in all files separately in-memory (I am not supposed to locally download this data). Ideally, I would like to read in each CSV data file into separate Pandas DataFrames (which I know how to do once I know how to access the S3 data).

Please understand I'm fairly new to both boto3 as well as S3, so I don't even know where to begin.

Upvotes: 0

Views: 2482

Answers (2)

fixatd
fixatd

Reputation: 1404

You'll have 2 options, both the options you've already mentioned:

  1. Downloading the file locally using download_file
s3.download_file(
    "<bucket-name>", 
    "<key-of-file>", 
    "<local-path-where-file-will-be-downloaded>"
)

See download_file

  1. Loading the file contents into memory using get_object
response = s3.get_object(Bucket="<bucket-name>", Key="<key-of-file>")
contentBody = response.get("Body")
# You need to read the content as it is a Stream
content = contentBody.read()

See get_object

Either approach is fine and you can just chose which one fits your scenario better.

Upvotes: 3

GRVPrasad
GRVPrasad

Reputation: 1142

Try this:

import boto3
s3 = boto3.resource('s3')
obj = s3.Object(<<bucketname>>, <<itemname>>)
body = obj.get()['Body'].read()

Upvotes: 2

Related Questions