user1850133
user1850133

Reputation: 2993

python strip string from end the most greedily

here it is:

str_ = 'file_.csv_.csv.bz2'
re.sub(regex, '', str_)

I want 'regex' value to get 'file_.csv_' i.e. the file name without the actual extension which here '.csv.bz2' and could be '.csv.*' while .* = ''|bz2|gz|7z|... any compression format.

More precisely I want re.sub to match from the end of str_ the most greedily. with regex = '\.csv.*$' I would get only 'file_'.

I could of course do os.path.splitext() - check if str_ ends with '.csv' - os.path.splitext() if so, but is there a shorter way?

Upvotes: 1

Views: 113

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1123400

You could use re.split() splitting of the suffix:

result = re.split(r'\.csv(?:\.\w+)?$', filename)[0]

Demo:

>>> import re
>>> filename = 'file_.csv_.csv.bz2'
>>> re.split(r'\.csv(?:\.\w+)?$', filename)[0]
'file_.csv_'
>>> re.split(r'\.csv(?:\.\w+)?$', 'foobar_.csv_.csv')[0]
'foobar_.csv_'
>>> re.split(r'\.csv(?:\.\w+)?$', 'foobar_.csv_.csv.gz')[0]
'foobar_.csv_'

Upvotes: 2

Avinash Raj
Avinash Raj

Reputation: 174766

This would remove all the continuous extensions and prints only the filename,

>>> s = "file_.csv_.csv.bz2"
>>> m = re.sub(r'[.a-z0-9]+$', r'', s)
>>> m
'file_.csv_'
>>> s = "foobar_.csv_.csv.gz"
>>> m = re.sub(r'[.a-z0-9]+$', r'', s)
>>> m
'foobar_.csv_'

Upvotes: 0

Related Questions