Reputation: 316
I'm trying to download my cifar 10 data that is in S3 to train it in AWS SageMaker.
I'm using this code to load the data:
import s3fs
fs = s3fs.S3FileSystem()
def unpickle(file):
dict = pickle.load(file, encoding='bytes')
return dict
with fs.open(f's3://bucket_name/data_batch_1') as f:
data= unpickle(f)
I'm getting the error "EOFError: Ran out of input" on the unpickle function. I assume the "file" is empty, but I tried different ways to get the data from my bucket, and can't seem to get it right.
Upvotes: 0
Views: 229
Reputation: 1016
Unless you have granted the appropriate permissions in IAM for the user to have access to the S3 bucket, the easiest fix is to grant public access, i.e. make sure all are unchecked as below.
Then, using boto3 is an option for importing the dataset from S3 into SageMaker. Here is an example:
import boto3
import botocore
import pandas as pd
from sagemaker import get_execution_role
role = get_execution_role()
bucket = 'databucketname'
data_key = 'datasetname.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
train_df = pd.read_csv(data_location)
Hope this helps.
Upvotes: 1