anaz8
anaz8

Reputation: 125

AWS Glue Python FileNotFoundError: [Errno 2] No such file or director

I am trying to use AWS Glue to move files between cross account S3 buckets. I am using Glue with python shell. I have list and get object permissions on the source bucket. I am able to list all the files but when I try to load files to the destination bucket I am getting the error "FileNotFoundError: [Errno 2] No such file or directory: 'test/f1=x/type=b/file1.parquet'

The files on the source s3 have partitions:

test/f1=x/type=a/file1.parquet
test/f1=x/type=a/file2.parquet
test/f1=x/type=b/file1.parquet
test/f1=x/type=b/file2.parquet

I am only trying to load files with f1=x and type=b

import pandas as pd 
import boto3
         
client = boto3.client('s3')
bucket = 'mysourcebucketname' 
folder_path = 'test/f1=x/type=b/'
       
def my_keys(bucket,folder_path):
    keys = []
    resp = client.list_objects(Bucket=bucket, Prefix=folder_path)
    for obj in resp['Contents']:
        keys.append(obj['Key'])
    return keys
           
files = my_keys(bucket,folder_path)
#print(files)
     
for file in files:
    bucketdest = 'mydestinationbucket'
    new_file_name = file.split('/')[-1]
    s3_file = 'destfolder1/destfolder2/'+"typeb"+new_file_name
    client.upload_file(file,bucketdest,s3_file,ExtraArgs={'GrantFullControl':'id =""})

Upvotes: 1

Views: 1892

Answers (2)

dashsidd1
dashsidd1

Reputation: 16

This can be implemented using :

def move_files(BUCKET, SOURCE, DESTINATION, FILENAME):
session = Session(aws_access_key_id= <Access_ID>,
                  aws_secret_access_key= <Secret Key>)
s3_resource = session.resource('s3')
destination_key = DESTINATION + FILENAME
source_key = SOURCE + FILENAME
try:
    s3_resource.Object(BUCKET, destination_key).copy_from(
        CopySource=BUCKET + '/' + source_key)
    s3_resource.Object(BUCKET, source_key).delete()
except Exception as error:
    print(error)

Also make sure that your IAM user is having access on both the buckets .

Upvotes: 0

Marcin
Marcin

Reputation: 238081

upload_file is for uploading from local drive to S3. So your code is looking for a local file called test/f1=x/type=b/file1.parquet, which obviously does not exist, because it is on S3 as you wrote. Maybe you want to download these files instead?

Upvotes: 1

Related Questions