Bouji
Bouji

Reputation: 316

Loading S3File in AWS

I'm trying to download my cifar 10 data that is in S3 to train it in AWS SageMaker.

I'm using this code to load the data:

import s3fs
fs = s3fs.S3FileSystem()

def unpickle(file):
    dict = pickle.load(file, encoding='bytes')
    return dict

with fs.open(f's3://bucket_name/data_batch_1') as f:
    data= unpickle(f)

I'm getting the error "EOFError: Ran out of input" on the unpickle function. I assume the "file" is empty, but I tried different ways to get the data from my bucket, and can't seem to get it right.

Upvotes: 0

Views: 229

Answers (1)

Michael Grogan
Michael Grogan

Reputation: 1016

Unless you have granted the appropriate permissions in IAM for the user to have access to the S3 bucket, the easiest fix is to grant public access, i.e. make sure all are unchecked as below.

public access

Then, using boto3 is an option for importing the dataset from S3 into SageMaker. Here is an example:

import boto3 
import botocore 
import pandas as pd 
from sagemaker import get_execution_role 

role = get_execution_role() 

bucket = 'databucketname' 
data_key = 'datasetname.csv'
data_location = 's3://{}/{}'.format(bucket, data_key) 

train_df = pd.read_csv(data_location)

Hope this helps.

Upvotes: 1

Related Questions