Reputation: 23
I would like to extract a filename from a path using regular expression:
mysting = '/content/drive/My Drive/data/happy (463).jpg'
How do I extract 'happy.jpg'
?
I have tried this: '[^/]*$'
but the result still includes the number in parenthesis which I do not want: 'happy (463).jpg'
How could I improve it?
Upvotes: 2
Views: 814
Reputation: 48
I use javascript.
In javascript case,
const myString="happy (463).jpg";
const result=myString.replace(/\s\(\d*\)/,'');
After you split path in slash separator, you can apply this code.
Upvotes: -1
Reputation: 42017
Without Regex; str
methods (str.partition
and str.rpartition
):
In [185]: filename = mysting.rpartition('/')[-1]
In [186]: filename
Out[186]: 'happy (463).jpg'
In [187]: f"{filename.partition(' ')[0]}.{filename.rpartition('.')[-1]}"
Out[187]: 'happy.jpg'
With Regex; re.sub
:
re.sub(r'.*/(?!.*/)([^\s]+)[^.]+(\..*)', r'\1\2', mysting)
.*/
greedily matches upto last /
The zero-width negative lookahead (?!.*/)
ensures there is no /
in anyplace forward
([^\s]+)
matches upto the next whitespace and put as the first captured group
[^.]+
matches upto next .
(\..*)
matches a literal .
followed by any number of characters and put as the second captured group; if you want to match more conservatively like 3 characters or even literal .jpg
you can do that also
in the replacement, only the captured groups are used
Example:
In [183]: mysting = '/content/drive/My Drive/data/happy (463).jpg'
In [184]: re.sub(r'.*/(?!.*/)([^\s]+)[^.]+(\..*)', r'\1\2', mysting)
Out[184]: 'happy.jpg'
Upvotes: 1
Reputation: 163352
You could use 2 capturing groups. In the first group match /
and capture 1+ word chars in group 1.
Then match 1+ digits between parenthesis and capture .jpg
asserting the end of the string in group 2.
^.*/(\w+)\s*\(\d+\)(\.jpg)$
In parts that will match
^.*/
Match until last /
(\w+)
Catpure group 1, match 1+ word chars\s*
Match 1+ whitespace chars\(\d+\)
Match 1+ digits between parenthesis(\.jpg)
Capture group 2, match .jpg
$
End of stringThen use group 1 and group 2 in the replacement to get happy.jpg
import re
regex = r"^.*/(\w+)\s*\(\d+\)(\.jpg)$"
test_str = "/content/drive/My Drive/data/happy (463).jpg"
result = re.sub(regex, r"\1\2", test_str, 1)
if result:
print (result)
Output
happy.jpg
Upvotes: 2