kutlus
kutlus

Reputation: 361

How to list of unique file types-extensions in recursive directory in Python3 in script?

I would like to list all unique extension types in a recursive directory. I have tried the following code, it didn't print anything. It works if I put for example ".m4a". In addition, even if this code worked for ".*" it was going to list all the files but I only want the list of unique extensions. I can't list of extensions and search for them because I don't know what different file types exist.

for file in os.listdir(root):
    if file.endswith(".*"):
        print(os.path.join(root, file))

This question has been asked at How can I find all of the distinct file extensions in a folder hierarchy? but it didn't help because it is for Linux machine.

Upvotes: 3

Views: 2789

Answers (5)

codeslord
codeslord

Reputation: 2368

Adding to the answer https://stackoverflow.com/a/54077718/8942966 by Patrick

You can get the number of files associated with each extension using the following.

from collections import Counter

extensions = list(os.path.splitext(f)[1] for dir,dirs,files in os.walk('.') for f in files)

print(Counter(extensions))

Upvotes: 0

kabanus
kabanus

Reputation: 25895

That other question is not about Python anyway. One way to do this is to walk the path, which recursively enters subdirectories and add the file types to a set:

import os
exts = set(f.split('.')[-1] for dir,dirs,files in os.walk('.') for f in files if '.' in f)

Use [-1] after splitting to extract the last part, in-case the filename contains a ..

Use if '.' in f to make sure the file actually has an extension.

Mulled it over

and my insistence to not use splitext seems unwarranted, it's much cleaner:

import os
exts = set(os.splitext(f)[1] for dir,dirs,files in os.walk('.') for f in files)

which will return empty extensions for files with no extension.

Upvotes: 4

Patrick Artner
Patrick Artner

Reputation: 51643

you are only looking for files that end on (literally) .* - simply do:

import os

extensions = set()
my_root = "./"  # some dir to start in

for root, dirs, files in os.walk(my_root) :
    for file in files: 
        pathname, exten = os.path.splitext(file) 
        extensions.add(exten)

print(extensions) # or print(list(extensions)) if you want a list afterwards            

Putting the extensions into a set makes them unique

Doku:


If you want a (long) 1-liner: see kabanus answer - same logic but set comprehension and hence slightly faster) - not that it matters much ;o)

Upvotes: 1

Baco
Baco

Reputation: 91

You may try something like:

from os import path 
from glob import glob 
root = '/tmp' 
exts = set() 
for file_ in glob(root + '/**/*.*', recursive=True): 
    exts.add(path.splitext(file_)[-1])

and in exts you will find all unique extensions

Upvotes: 0

Marcus
Marcus

Reputation: 3524

The * character is normally interpreted by the shell and expanded by it. To access similar functionality in python, you can us the glob module of the standard library. Here's an example doing what you want to achieve:

from glob import glob

extensions = set(filename.split('.')[1] for filename in glob('*.*'))

for extension in extensions:
    print(extension)

Upvotes: 0

Related Questions