Dinesh Kumar
Dinesh Kumar

Reputation: 75

How can I load csv files as separate dataframe from a S3 bucket in python?

s3 = boto3.client('s3')

def get_s3_keys(bucket, prefix):

    """Get a list of keys in an S3 bucket."""
    resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
    for obj in resp['Contents']:
      files = obj['Key']
      print(files)
    return files


filename = get_s3_keys('bucket', 'folder')
print(filename)

I use the above function to get the keys and I see 3 csv files. I want them to be imported into separate datframes.

Upvotes: 1

Views: 571

Answers (1)

baduker
baduker

Reputation: 20022

I have three files on my S3: airtravel.csv, cities.csv, tally_cab.csv, which were taken from here.

I use this:

import boto3
import pandas as pd

my_bucket = "eine-spinne"
s3 = boto3.client("s3")
resp = s3.list_objects_v2(Bucket=my_bucket)

dfs = {}
for file in resp['Contents']:
    key = file['Key']
    if key.endswith(".csv"):
        object_data = s3.get_object(Bucket=my_bucket, Key=key)
        dfs[key] = pd.read_csv(object_data['Body'])

print(type(dfs['airtravel.csv']), "\n", dfs['airtravel.csv'].iloc[0])

This creates a dict of three dataframes where S3 files are keys.

Sample output:

<class 'pandas.core.frame.DataFrame'> 
 Month      JAN
 "1958"    340
 "1959"    360
 "1960"    417
Name: 0, dtype: object

Upvotes: 1

Related Questions