Reputation: 6440
I am trying to find the extension of a file, given its name as a string. I know I can use the function os.path.splitext
but it does not work as expected in case my file extension is .tar.gz
or .tar.bz2
as it gives the extensions as gz
and bz2
instead of tar.gz
and tar.bz2
respectively.
So I decided to find the extension of files myself using pattern matching.
print re.compile(r'^.*[.](?P<ext>tar\.gz|tar\.bz2|\w+)$').match('a.tar.gz')group('ext')
>>> gz # I want this to come as 'tar.gz'
print re.compile(r'^.*[.](?P<ext>tar\.gz|tar\.bz2|\w+)$').match('a.tar.bz2')group('ext')
>>> bz2 # I want this to come 'tar.bz2'
I am using (?P<ext>...)
in my pattern matching as I also want to get the extension.
Please help.
Upvotes: 8
Views: 24115
Reputation: 1433
this is simple and works on both single and multiple extensions
In [1]: '/folder/folder/folder/filename.tar.gz'.split('/')[-1].split('.')[0]
Out[1]: 'filename'
In [2]: '/folder/folder/folder/filename.tar'.split('/')[-1].split('.')[0]
Out[2]: 'filename'
In [3]: 'filename.tar.gz'.split('/')[-1].split('.')[0]
Out[3]: 'filename'
Upvotes: 2
Reputation: 11
Continuing from phihags answer to generic remove all double or triple extensions such as CropQDS275.jpg.aux.xml use while '.' in:
tempfilename, file_extension = os.path.splitext(filename)
while '.' in tempfilename:
tempfilename, tempfile_extension = os.path.splitext(tempfilename)
file_extension = tempfile_extension + file_extension
Upvotes: 1
Reputation: 6175
Starting from phihags answer:
DOUBLE_EXTENSIONS = ['tar.gz','tar.bz2'] # Add extra extensions where desired.
def guess_extension(filename):
"""
Guess the extension of given filename.
"""
root,ext = os.path.splitext(filename)
if any([filename.endswith(x) for x in DOUBLE_EXTENSIONS]):
root, first_ext = os.path.splitext(root)
ext = first_ext + ext
return root, ext
Upvotes: 3
Reputation: 20419
I have idea which is much easier than breaking your head with regex,sometime it might sound stupid too.
name="filename.tar.gz"
extensions=('.tar.gz','.py')
[x for x in extensions if name.endswith(x)]
Upvotes: 3
Reputation: 9480
>>> print re.compile(r'^.*[.](?P<ext>tar\.gz|tar\.bz2|\w+)$').match('a.tar.gz').group('ext')
gz
>>> print re.compile(r'^.*?[.](?P<ext>tar\.gz|tar\.bz2|\w+)$').match('a.tar.gz').group('ext')
tar.gz
>>>
The ? operator tries to find the minimal match, so instead of .* eating ".tar" as well, .*? finds the minimal match that allows .tar.gz to be matched.
Upvotes: 5
Reputation: 287755
root,ext = os.path.splitext('a.tar.gz')
if ext in ['.gz', '.bz2']:
ext = os.path.splitext(root)[1] + ext
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Upvotes: 21