Reputation: 4426
How can I get the string embedded in {}
after a keyword, where the number of characters between the keyword and the braces {}
is unknown. e.g.:
includegraphics[x=2]{image.pdf}
the keyword would be includegraphics and the string to be found is image.pdf, but the text in between [x=2]
could have anything between the two []
.
So I want to ignore all characters between the keyword and {
or I want to ignore everything between []
Upvotes: 2
Views: 71
Reputation: 17506
Use re.findall
>>> sample = 'includegraphics[x=2]{image.pdf}'
>>> re.findall('includegraphics.*?{(.*?)}',sample)
['image.pdf']
Explanation:
The re
module deals with regular expressions in Python. Its findall
method is useful to find all occurences of a pattern in a string.
A regular expression for the pattern you are interested in is 'includegraphics.*?{(.*?)}'
. Here .
symbolizes "any character", while the *
means 0 or more times. The question mark makes this a non-greedy operation. From the documentation:
The
*
,+
, and?
qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE<.*>
is matched against<H1\>title</H1>
, it will match the entire string, and not just<H1>
. Adding?
after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using.*?
in the previous expression will match only<H1>
.
Please note that while in your case using .*?
should be fine, in general it's better to use more specialized character groups such as \w
for alphanumerics and \d
for digits, when you know what the content is going to consist of in advance.
Upvotes: 2
Reputation: 174736
Use re.search
re.search(r'includegraphics\[[^\[\]]*\]\{([^}]*)\}', s).group(1)
Upvotes: 0