Reputation: 2093
I have the following string:
fo = "b---00b<do:YYYY>tftt_<fd>-<fd><ct><ct:MM>mmm.pdf"
And I only want to get mmm.pdf
.
When I try:
match = re.search(r'(>.*?\.pdf)', fo)
for g in match.groups():
print g
I get:
>tftt_<fd>-<fd><ct><ct:MM>mmm.pdf
I though the ?
symbol will make the search stop at the first >
, but the pattern (>.*\.pdf)
gives me the same result.
Which is the correct regular expression for getting mmm.pdf
?
mmm.pdf
can be abcs.pdf
, qwerty123.pdf
, etc. And fo
always have the format:
fo = "someOptionalstring<otherstring>anotherOptionalString<string>optionalstring<string>mmm.pdf"
The alternation between strings
(can be empty) and <strings>
(not empty) can be in any amount. I could find regular expressions to extract these values, but not the desired string at the end.
I could use an algorithm using endswith()
and looking for the last >
character, but I want to try regular expressions for learning purposes.
Upvotes: 0
Views: 79
Reputation: 73
This also works if there is always 3 characters preceding dot. match=re.search('>(.{3}\.pdf)',fo)
Upvotes: 0
Reputation: 91385
Use [^>]*\.pdf
instead:
where [^>]*
means 0 or more any character that is not >
fo = "b---00b<do:YYYY>tftt_<fd>-<fd><ct><ct:MM>mmm.pdf"
match = re.search(r'([^>]*\.pdf)', fo)
for g in match.groups():
print g
Output:
mmm.pdf
Upvotes: 2