fufu
fufu

Reputation: 11

print using regular expression for every line I have

I have a text with 3589 lines, every 5 lines I want to extract the image name from it, using Regular expression.

The line looks like: URL,https://google.com/Document/Projects/Images/Turk/IMG-2021-606-WA1227.jpg

I need to only print the image name, which is: "IMG-2021-606-WA1227.jpg"

Code I have so far:

file1 = open('./data/input/filesn.csv', 'r')
Lines = file1.readlines()
count = 0
for line in Lines:
    print("Line{}: {}".format(count, line.strip()))
    count += 1
    if "URL" in line :
        print("Image:")

Upvotes: 0

Views: 598

Answers (1)

ThePyGuy
ThePyGuy

Reputation: 18426

It's a good idea to compile a pattern if you need to use it multiple times, add pattern = re.compile('IMG.*?\.jpg') before the loop, then inside the print statement, print the substring that matches the pattern:

print("Image:", pattern.findall(line)[0])

Understanding the pattern 'IMG.*?\.jpg':

  • The pattern will look for IMG text in the line
  • If it is found, .*?\.jpg will take all the strings upto the point where first occurrence of .jpg is found.
  • If .jpg is not found after finding IMG or IMG is not found, it means line doesn't match the given regex.

If you get the IndexError, that means line does not contain the required substring that matches the pattern, so it'd be better if you store the variable locally and print the first item only if it is non-empty:

img = pattern.findall(line)
if img:
    print(img[0])

Upvotes: 1

Related Questions