gperna
gperna

Reputation: 23

How to extract filename from path using regex

I would like to extract a filename from a path using regular expression:

mysting = '/content/drive/My Drive/data/happy (463).jpg'

How do I extract 'happy.jpg'?

I have tried this: '[^/]*$' but the result still includes the number in parenthesis which I do not want: 'happy (463).jpg'

How could I improve it?

Upvotes: 2

Views: 814

Answers (3)

C.Moon
C.Moon

Reputation: 48

I use javascript.

In javascript case,

const myString="happy (463).jpg";

const result=myString.replace(/\s\(\d*\)/,'');

After you split path in slash separator, you can apply this code.

Upvotes: -1

heemayl
heemayl

Reputation: 42017

Without Regex; str methods (str.partition and str.rpartition):

In [185]: filename = mysting.rpartition('/')[-1] 

In [186]: filename 
Out[186]: 'happy (463).jpg'

In [187]: f"{filename.partition(' ')[0]}.{filename.rpartition('.')[-1]}"
Out[187]: 'happy.jpg'

With Regex; re.sub:

re.sub(r'.*/(?!.*/)([^\s]+)[^.]+(\..*)', r'\1\2', mysting)
  • .*/ greedily matches upto last /

  • The zero-width negative lookahead (?!.*/) ensures there is no / in anyplace forward

  • ([^\s]+) matches upto the next whitespace and put as the first captured group

  • [^.]+ matches upto next .

  • (\..*) matches a literal . followed by any number of characters and put as the second captured group; if you want to match more conservatively like 3 characters or even literal .jpg you can do that also

  • in the replacement, only the captured groups are used

Example:

In [183]: mysting = '/content/drive/My Drive/data/happy (463).jpg'

In [184]: re.sub(r'.*/(?!.*/)([^\s]+)[^.]+(\..*)', r'\1\2', mysting)
Out[184]: 'happy.jpg'

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163352

You could use 2 capturing groups. In the first group match / and capture 1+ word chars in group 1.

Then match 1+ digits between parenthesis and capture .jpg asserting the end of the string in group 2.

^.*/(\w+)\s*\(\d+\)(\.jpg)$

In parts that will match

  • ^.*/ Match until last /
  • (\w+) Catpure group 1, match 1+ word chars
  • \s* Match 1+ whitespace chars
  • \(\d+\) Match 1+ digits between parenthesis
  • (\.jpg) Capture group 2, match .jpg
  • $ End of string

Regex demo | Python demo

Then use group 1 and group 2 in the replacement to get happy.jpg

import re

regex = r"^.*/(\w+)\s*\(\d+\)(\.jpg)$"
test_str = "/content/drive/My Drive/data/happy (463).jpg"
result = re.sub(regex,  r"\1\2", test_str, 1)

if result:
    print (result)

Output

happy.jpg

Upvotes: 2

Related Questions