Reputation: 361
I would like to list all unique extension types in a recursive directory. I have tried the following code, it didn't print anything. It works if I put for example ".m4a". In addition, even if this code worked for ".*" it was going to list all the files but I only want the list of unique extensions. I can't list of extensions and search for them because I don't know what different file types exist.
for file in os.listdir(root):
if file.endswith(".*"):
print(os.path.join(root, file))
This question has been asked at How can I find all of the distinct file extensions in a folder hierarchy? but it didn't help because it is for Linux machine.
Upvotes: 3
Views: 2789
Reputation: 2368
Adding to the answer https://stackoverflow.com/a/54077718/8942966 by Patrick
You can get the number of files associated with each extension using the following.
from collections import Counter
extensions = list(os.path.splitext(f)[1] for dir,dirs,files in os.walk('.') for f in files)
print(Counter(extensions))
Upvotes: 0
Reputation: 25895
That other question is not about Python anyway. One way to do this is to walk the path, which recursively enters subdirectories and add the file types to a set:
import os
exts = set(f.split('.')[-1] for dir,dirs,files in os.walk('.') for f in files if '.' in f)
Use [-1]
after splitting to extract the last part, in-case the filename contains a .
.
Use if '.' in f
to make sure the file actually has an extension.
Mulled it over
and my insistence to not use splitext
seems unwarranted, it's much cleaner:
import os
exts = set(os.splitext(f)[1] for dir,dirs,files in os.walk('.') for f in files)
which will return empty extensions for files with no extension.
Upvotes: 4
Reputation: 51643
you are only looking for files that end on (literally) .*
- simply do:
import os
extensions = set()
my_root = "./" # some dir to start in
for root, dirs, files in os.walk(my_root) :
for file in files:
pathname, exten = os.path.splitext(file)
extensions.add(exten)
print(extensions) # or print(list(extensions)) if you want a list afterwards
Putting the extensions into a set
makes them unique
Doku:
If you want a (long) 1-liner: see kabanus answer - same logic but set comprehension and hence slightly faster) - not that it matters much ;o)
Upvotes: 1
Reputation: 91
You may try something like:
from os import path
from glob import glob
root = '/tmp'
exts = set()
for file_ in glob(root + '/**/*.*', recursive=True):
exts.add(path.splitext(file_)[-1])
and in exts
you will find all unique extensions
Upvotes: 0
Reputation: 3524
The *
character is normally interpreted by the shell and expanded by it. To access similar functionality in python, you can us the glob
module of the standard library. Here's an example doing what you want to achieve:
from glob import glob
extensions = set(filename.split('.')[1] for filename in glob('*.*'))
for extension in extensions:
print(extension)
Upvotes: 0