Reputation: 2993
here it is:
str_ = 'file_.csv_.csv.bz2'
re.sub(regex, '', str_)
I want 'regex' value to get 'file_.csv_'
i.e. the file name without the actual extension which here '.csv.bz2'
and could be '.csv.*'
while .* = ''|bz2|gz|7z|
... any compression format.
More precisely I want re.sub
to match from the end of str_
the most greedily.
with regex = '\.csv.*$'
I would get only 'file_'
.
I could of course do os.path.splitext()
- check if str_
ends with '.csv'
- os.path.splitext()
if so, but is there a shorter way?
Upvotes: 1
Views: 113
Reputation: 1123400
You could use re.split()
splitting of the suffix:
result = re.split(r'\.csv(?:\.\w+)?$', filename)[0]
Demo:
>>> import re
>>> filename = 'file_.csv_.csv.bz2'
>>> re.split(r'\.csv(?:\.\w+)?$', filename)[0]
'file_.csv_'
>>> re.split(r'\.csv(?:\.\w+)?$', 'foobar_.csv_.csv')[0]
'foobar_.csv_'
>>> re.split(r'\.csv(?:\.\w+)?$', 'foobar_.csv_.csv.gz')[0]
'foobar_.csv_'
Upvotes: 2
Reputation: 174766
This would remove all the continuous extensions and prints only the filename,
>>> s = "file_.csv_.csv.bz2"
>>> m = re.sub(r'[.a-z0-9]+$', r'', s)
>>> m
'file_.csv_'
>>> s = "foobar_.csv_.csv.gz"
>>> m = re.sub(r'[.a-z0-9]+$', r'', s)
>>> m
'foobar_.csv_'
Upvotes: 0