RustyShackleford
RustyShackleford

Reputation: 3667

How to pull only certain csv's and concat the data from s3?

I have a bucket with various files. I am only interested in pulling files that begin with the word 'member' and storing each member file in a list to be concated further into a dataframe.

Currently I am pulling data like this:

import boto3

my_bucket = s3.Bucket('my-bucket')

obj = s3.Object('my-bucket','member')



file_content = obj.get()['Body'].read().decode('utf-8')
df = pd.read_csv(file_content)

How ever this is only pulling the member file. I have member files that look like this 'member_1229013','member_2321903' etc.

How can I read in all the 'member' files, save the data in a list so I can concat later. All column names are the same in all csv's

Upvotes: 1

Views: 234

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 269101

You can only download/access one object per API call.

I normally recommend downloading the objects to a local directory, and then accessing them as normal local files. Here is an example of how to download an object from Amazon S3:

import boto3

s3 = boto3.client('s3')
s3.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')

See: download_file() documentation

If you want to read multiple files, you will first need to obtain a listing of the files (eg with list_objects_v2(), and then access each object individually.

One tip for boto3... There are two ways to make calls: via a Resource (eg using s3.Object() or s3.Bucket()) or via a Client, which passes everything as parameters.

Upvotes: 1

Related Questions