Reputation: 55
I am trying to link my s3 bucket to a notebook instance, however i am not able to:
Here is how much I know:
from sagemaker import get_execution_role
role = get_execution_role
bucket = 'atwinebankloadrisk'
datalocation = 'atwinebankloadrisk'
data_location = 's3://{}/'.format(bucket)
output_location = 's3://{}/'.format(bucket)
to call the data from the bucket:
df_test = pd.read_csv(data_location/'application_test.csv')
df_train = pd.read_csv('./application_train.csv')
df_bureau = pd.read_csv('./bureau_balance.csv')
However I keep getting errors and unable to proceed. I haven't found answers that can assist much.
PS: I am new to this AWS
Upvotes: 3
Views: 19548
Reputation: 1146
import boto3
# files are referred as objects in S3.
# file name is referred as key name in S3
def write_to_s3(filename, bucket_name, key):
with open(filename,'rb') as f: # Read in binary mode
return boto3.Session().resource('s3').Bucket(bucket).Object(key).upload_fileobj(f)
# Simple call the write_to_s3 function with required argument
write_to_s3('file_name.csv',
bucket_name,
'file_name.csv')
Upvotes: 0
Reputation: 71
In pandas 1.0.5, if you've already provided access to the notebook instance, reading a csv from S3 is as easy as this (https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#reading-remote-files):
df = pd.read_csv('s3://<bucket-name>/<filepath>.csv')
During the notebook setup process I attached a SageMakerFullAccess
policy to the notebook instance granting it access to the S3 bucket. You can also do this via the IAM Management console.
If you need credentials, there's three ways to providing them (https://s3fs.readthedocs.io/en/latest/#credentials):
aws_access_key_id
, aws_secret_access_key
, and aws_session_token
environment variables~/.aws/credentials
Upvotes: 2
Reputation: 5568
You're trying to use Pandas to read files from S3 - Pandas can read files from your local disk, but not directly from S3.
Instead, download the files from S3 to your local disk, then use Pandas to read them.
import boto3
import botocore
BUCKET_NAME = 'my-bucket' # replace with your bucket name
KEY = 'my_image_in_s3.jpg' # replace with your object key
s3 = boto3.resource('s3')
try:
# download as local file
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
# OR read directly to memory as bytes:
# bytes = s3.Object(BUCKET_NAME, KEY).get()['Body'].read()
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
Upvotes: 3
Reputation: 100
You can load S3 Data into AWS SageMaker Notebook by using the sample code below. Do make sure the Amazon SageMaker role has policy attached to it to have access to S3.
[1] https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html
import boto3
import botocore
import pandas as pd
from sagemaker import get_execution_role
role = get_execution_role()
bucket = 'Your_bucket_name'
data_key = your_data_file.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
pd.read_csv(data_location)
Upvotes: 6
Reputation: 2156
You can use the https://s3fs.readthedocs.io/en/latest/ to read s3 files directly with pandas. The code below is taken from here
import os import pandas as pd from s3fs.core import S3FileSystem os.environ['AWS_CONFIG_FILE'] = 'aws_config.ini' s3 = S3FileSystem(anon=False) key = 'path\to\your-csv.csv' bucket = 'your-bucket-name' df = pd.read_csv(s3.open('{}/{}'.format(bucket, key), mode='rb'))
Upvotes: 1