Richlewis
Richlewis

Reputation: 15374

Determine if folder or file key - Boto

Using boto and Python I am trying to differentiate whether a key is returning a folder of file (I am aware that S3 treats both exactly the same as I am not dealing with a filesystem directly).

What I have at the moment is 2 keys

<Key: my-folder,output/2019/01/28/>
<Key: my-folder,output/2019/01/28/part_1111>

The first being a "Folder" and the second being a "File". What I would like to do is determine whether the key is a "File" but unsure on how to determine this, the obvious is the key not ending in a / but how would I determine that in Python.

If I was iterating over a list() can I turn the key into a string or access the keys attributes?

for obj in srcBucket.list():
   # Get the Key object of the given key, in the bucket
   k = Key(srcBucket, obj.key)
   print(k)
   <Key: my-folder,output/2019/01/28/>
   <Key: my-folder,output/2019/01/28/part_1111>

Upvotes: 4

Views: 13528

Answers (4)

DrStrangepork
DrStrangepork

Reputation: 3134

You need to check both the 'Key' and the 'Size' of each object:

for object in boto3.client('s3').list_objects_v2(Bucket='my-bucket')['Contents']:
  if object['Size'] == 0 and object['Key'][-1] == '/':
    print(f"Folder: {object['Key']}, (size: {object['Size']})")
  elif object['Size'] == 0:
    print(f"Zero-length file: {object['Key']}, (size: {object['Size']})")
  else:
    print(f"File: {object['Key']}, (size: {object['Size']})")
File: a_folder/a_file.txt, (size: 543)
Folder: a_folder/, (size: 0)
Zero-length file: a_folder/zero_length_file.txt, (size: 0)

It is also not true that folder objects will necessarily have ['ContentType'] == 'application/x-directory':

for s3_obj_summary in boto3.resource('s3').Bucket('my-bucket').objects.all():
  print(str(f"{s3_obj_summary.key}: {s3_obj_summary.get()['ContentType']}"))


a_folder/: application/octet-stream

Upvotes: 2

Kanagavelu Sugumar
Kanagavelu Sugumar

Reputation: 19260

with exception handling when there are no files found:

s3_client = boto3.client('s3')
paginator = s3_client.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket="logs", Prefix="logs/access/2022122700/")
for page in pages:
    try:
        for obj in page['Contents']:
            print(obj['Key'])  # prints fileName
    except KeyError:
        print("Doesn't exist")

Upvotes: 0

pldimitrov
pldimitrov

Reputation: 1727

S3.Object and S3.ObjectSummary will have the following property:

'ContentType': 'application/x-directory'

if the key is a directory.

for s3_obj_summary in bucket.objects.all():
  if s3_obj_summary.get()['ContentType'] == 'application/x-directory':
    print(str(s3_obj_summary))

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.ObjectSummary.get

Upvotes: 6

John Rotenstein
John Rotenstein

Reputation: 269101

You are correct that folders do not exist. You could, for example, create a file called output/2020/01/01/foo.txt even if none of those sub-folders exist.

Some systems, however, like to 'create' a folder by making a zero-length object with the name of the pretend folder. In such cases, you can identify the 'folder' by checking the length of the object.

Here's some sample code (using boto3 client method):

import boto3

s3 = boto3.client('s3', region_name = 'ap-southeast-2')

response = s3.list_objects_v2(Bucket='my-bucket')

for object in response['Contents']:
  if object['Size'] == 0:
      # Print name of zero-size object
      print(object['Key'])

Officially, there is no reason for such 'folder files' to exist. Amazon S3 will work perfectly well without them (and often better, for reasons you are discovering!).

Upvotes: 5

Related Questions