search for string embedded in {} after keyword

Question

How can I get the string embedded in {} after a keyword, where the number of characters between the keyword and the braces {} is unknown. e.g.:

includegraphics[x=2]{image.pdf}

the keyword would be includegraphics and the string to be found is image.pdf, but the text in between [x=2] could have anything between the two []. So I want to ignore all characters between the keyword and { or I want to ignore everything between []

Sebastian Wozny · Accepted Answer

Use re.findall

>>> sample = 'includegraphics[x=2]{image.pdf}'
>>> re.findall('includegraphics.*?{(.*?)}',sample)
['image.pdf']

Explanation:

The re module deals with regular expressions in Python. Its findall method is useful to find all occurences of a pattern in a string.

A regular expression for the pattern you are interested in is 'includegraphics.*?{(.*?)}'. Here . symbolizes "any character", while the * means 0 or more times. The question mark makes this a non-greedy operation. From the documentation:

The *, +, and ? qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against title, it will match the entire string, and not just
. Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only
.

Please note that while in your case using .*? should be fine, in general it's better to use more specialized character groups such as \w for alphanumerics and \d for digits, when you know what the content is going to consist of in advance.

search for string embedded in {} after keyword

Answers (2)

Related Questions