Broken_Window
Broken_Window

Reputation: 2093

Matching regular expression at the end of the string with Python 2.7.13

I have the following string:

fo = "b---00b<do:YYYY>tftt_<fd>-<fd><ct><ct:MM>mmm.pdf"

And I only want to get mmm.pdf.

When I try:

match = re.search(r'(>.*?\.pdf)', fo)

for g in match.groups():
    print g

I get:

>tftt_<fd>-<fd><ct><ct:MM>mmm.pdf

I though the ? symbol will make the search stop at the first >, but the pattern (>.*\.pdf) gives me the same result. Which is the correct regular expression for getting mmm.pdf?

mmm.pdf can be abcs.pdf, qwerty123.pdf, etc. And fo always have the format:

fo = "someOptionalstring<otherstring>anotherOptionalString<string>optionalstring<string>mmm.pdf"

The alternation between strings (can be empty) and <strings> (not empty) can be in any amount. I could find regular expressions to extract these values, but not the desired string at the end.

I could use an algorithm using endswith() and looking for the last > character, but I want to try regular expressions for learning purposes.

Upvotes: 0

Views: 79

Answers (2)

perseverance
perseverance

Reputation: 73

This also works if there is always 3 characters preceding dot. match=re.search('>(.{3}\.pdf)',fo)

Upvotes: 0

Toto
Toto

Reputation: 91385

Use [^>]*\.pdf instead:

where [^>]* means 0 or more any character that is not >

fo = "b---00b<do:YYYY>tftt_<fd>-<fd><ct><ct:MM>mmm.pdf"
match = re.search(r'([^>]*\.pdf)', fo)
for g in match.groups():
    print g   

Output:

mmm.pdf     

Upvotes: 2

Related Questions