Reputation: 23
Consider the following text:
{\Largefont\it Hello world!} Some text. { \Hugefont \sl Thanks.}
I am trying to write a regular expression which will:
The regex
re.compile(r'\{\s*[^{}]+\}')
does the first part of the job. How do I accomplish the second part? In particular, I do not want \Largefont\it
to be treated as a single word but rather as two separate words \Largefont
and \it
. The expected output is:
{\Largefont\it Hello world!}
{ \Hugefont \sl Thanks.}
Thank you.
Upvotes: 2
Views: 188
Reputation: 18357
You need to use a positive look ahead that will ensure your incoming data follows the pattern. Here is the regex you can use,
(?<=\{)(?=\s*\\[^{}\\]*font)[^{}]+(?=\})
Explanation:
(?<=\{)
- Positive look behind to ensure the text is preceded by {
character(?=\s*\\[^{}\\]*font)
- Positive look ahead to ensure content inside curly brackets starts with optional white space then \
then first word contains font
in first word followed by optional characters other than {
or }
[^{}]+
- Actually captures the intended text(?=\})
- Positive look ahead to ensure captured content is contained within closing curly bracketUpvotes: 1
Reputation: 10360
Try this Regex:
(?<={)\s*\\[^\\]*font[^{}]*(?=})
Explanation:
(?<={)
- positive lookbehind to make sure that the current position is immediately preceded by a {
\s*\\
- matches 0+ whitespaces followed by a \
[^\\]*font
- matches 0+ occurrences of any character that is not a \
followed by the substring font
[^{}]*
- matches 0+ occurrences of any character that is neither a {
nor a }
. This subpart makes sure that you are getting the content of innermost curly brackets(?=})
- positive lookahead to make sure that the current position is immediately followed by a }
Upvotes: 1