pmath
pmath

Reputation: 23

Find text in innermost curly brackets starting with word with given substring

Consider the following text:

{\Largefont\it Hello world!} Some text. {   \Hugefont \sl Thanks.}

I am trying to write a regular expression which will:

  1. identify innermost curly brackets in the full text, and
  2. check if the first word in the identified block of text starts with '\' and has a substring 'font' in it.

The regex

re.compile(r'\{\s*[^{}]+\}')

does the first part of the job. How do I accomplish the second part? In particular, I do not want \Largefont\it to be treated as a single word but rather as two separate words \Largefont and \it. The expected output is:

{\Largefont\it Hello world!}
{   \Hugefont \sl Thanks.}

Thank you.

Upvotes: 2

Views: 188

Answers (2)

Pushpesh Kumar Rajwanshi
Pushpesh Kumar Rajwanshi

Reputation: 18357

You need to use a positive look ahead that will ensure your incoming data follows the pattern. Here is the regex you can use,

(?<=\{)(?=\s*\\[^{}\\]*font)[^{}]+(?=\})

Demo

Explanation:

  • (?<=\{) - Positive look behind to ensure the text is preceded by { character
  • (?=\s*\\[^{}\\]*font) - Positive look ahead to ensure content inside curly brackets starts with optional white space then \ then first word contains font in first word followed by optional characters other than { or }
  • [^{}]+ - Actually captures the intended text
  • (?=\}) - Positive look ahead to ensure captured content is contained within closing curly bracket

Upvotes: 1

Gurmanjot Singh
Gurmanjot Singh

Reputation: 10360

Try this Regex:

(?<={)\s*\\[^\\]*font[^{}]*(?=})

Click for Demo

Explanation:

  • (?<={) - positive lookbehind to make sure that the current position is immediately preceded by a {
  • \s*\\ - matches 0+ whitespaces followed by a \
  • [^\\]*font - matches 0+ occurrences of any character that is not a \ followed by the substring font
  • [^{}]* - matches 0+ occurrences of any character that is neither a { nor a }. This subpart makes sure that you are getting the content of innermost curly brackets
  • (?=}) - positive lookahead to make sure that the current position is immediately followed by a }

Upvotes: 1

Related Questions