Reputation: 468
The following code can be run directly. What I want is to return a list: l = [1,2]
(string). However, what I got is the string between the very first "begin" and the last "end". Even though this is one of the expected results. I can not find it out what happened.
import re
text = r'''
\begin{figure}
1
\end{figure}
aaa
\begin{figure}
2
\end{figure}
'''
pattern = r"\\begin{figure}([\s\S^f]*)\\end{figure}"
r = re.findall(pattern, text)
print(r)
Upvotes: 1
Views: 93
Reputation: 522626
Your pattern had multiple problems. Here is a working version:
text = r'''
\begin{figure}
1
\end{figure}
aaa
\begin{figure}
2
\end{figure}
'''
pattern = r"\\begin\{figure\}(?:(?!\\end\{figure\}).)*?(\d+).*?\\end\{figure\}"
nums = re.findall(pattern, text, flags=re.DOTALL)
print(nums) # ['1', '2']
Notes: I am using a tempered dot to match the content after the leading \begin{figure}
marker without crossing over the closing \end{figure}
marker. I also use dot all mode here, so that .*
can match across newlines. In addition, you had some regex metacharacters, such as {
, which needed to be escaped by backslash.
Upvotes: 0
Reputation: 428
The *
operator captures as many characters as possible. This means it captures until the last occurence of \end{figure}
If you only want to capture as many characters as needed, use *?
instead: pattern = r"\\begin{figure}([\s\S^f]*?)\\end{figure}"
.
Upvotes: 1