Ayushman Buragohain
Ayushman Buragohain

Reputation: 129

python regular expression to extract file_name from a given directory path

I have some folders like-so:

../dog_breeds/images/Images/n02085620-Chihuahua
../dog_breeds/images/Images/n02085782-Japanese_spaniel
../dog_breeds/images/Images/n02086910-papillon
../dog_breeds/images/Images/n02088466-bloodhound
....
....

I want to extract only these info's(Chihuahua, Japanese_spaniel, papillon, bloodhound) from the paths of the files using Python. Can anybody help me ?

Upvotes: 1

Views: 34

Answers (2)

Rakesh
Rakesh

Reputation: 82755

You can use str.split here

Ex:

s = """../dog_breeds/images/Images/n02085620-Chihuahua
../dog_breeds/images/Images/n02085782-Japanese_spaniel
../dog_breeds/images/Images/n02086910-papillon
../dog_breeds/images/Images/n02088466-bloodhound
"""

for p in s.splitlines():
    print(p.split("-")[-1])

If you need regex.

import re

for p in s.splitlines():
    print(re.search(r"\-(\w+)$", p).group(1))

Output:

Chihuahua
Japanese_spaniel
papillon
bloodhound

Upvotes: 1

Ahx
Ahx

Reputation: 7985

There are two common factors.

  • Image folder paths are same

    • For instance: all image starts with ../dog_breeds/images/Images/
    • replace('../dog_breeds/images/Images/', '') for removing the paths
  • All images start with 10-character

    • For instance: n02085620-, n02085782-
    • replace('../dog_breeds/images/Images/', '')[10:] for removing the characters.

If we combine the two factors:

res = ['../dog_breeds/images/Images/n02085620-Chihuahua',
       '../dog_breeds/images/Images/n02085782-Japanese_spaniel',
       '../dog_breeds/images/Images/n02086910-papillon',
       '../dog_breeds/images/Images/n02088466-bloodhound']

part1 = [r.replace('../dog_breeds/images/Images/', '')[10:] for r in res]
print(part1)

Output is:

['Chihuahua', 'Japanese_spaniel', 'papillon', 'bloodhound']

Upvotes: 1

Related Questions