Reputation: 11
I have a text with 3589 lines, every 5 lines I want to extract the image name from it, using Regular expression.
The line looks like: URL,https://google.com/Document/Projects/Images/Turk/IMG-2021-606-WA1227.jpg
I need to only print the image name, which is: "IMG-2021-606-WA1227.jpg"
Code I have so far:
file1 = open('./data/input/filesn.csv', 'r')
Lines = file1.readlines()
count = 0
for line in Lines:
print("Line{}: {}".format(count, line.strip()))
count += 1
if "URL" in line :
print("Image:")
Upvotes: 0
Views: 598
Reputation: 18426
It's a good idea to compile a pattern if you need to use it multiple times, add pattern = re.compile('IMG.*?\.jpg')
before the loop, then inside the print statement, print the substring that matches the pattern:
print("Image:", pattern.findall(line)[0])
Understanding the pattern 'IMG.*?\.jpg'
:
IMG
text in the line.*?\.jpg
will take all the strings upto the point
where first occurrence of .jpg
is found..jpg
is not found after finding IMG
or IMG
is not found, it means line doesn't match the given regex.If you get the IndexError, that means line
does not contain the required substring that matches the pattern, so it'd be better if you store the variable locally and print the first item only if it is non-empty:
img = pattern.findall(line)
if img:
print(img[0])
Upvotes: 1