MLNLPEnhusiast
MLNLPEnhusiast

Reputation: 153

Regex Search end of line and beginning of next line

Trying to come up with a regex to search for keyword match at end of line and beginning of next line(if present)

I have tried below regex and does not seem to return desired result

re.compile(fr"\s(?!^)(keyword1|keyword2|keyword3)\s*\$\n\r\((\w+\W+|W+\w+))", re.MULTILINE | re.IGNORECASE)

My input for example is

sentence = """ This is my keyword
/n value"""

Output in above case should be keyword value

Thanks in advance

Upvotes: 1

Views: 1763

Answers (3)

The fourth bird
The fourth bird

Reputation: 163352

You could match the keyword (Or use an alternation) to match more keywords and take trailing tabs and spaces into account after the keyword and after matching a newline.

Using 2 capturing groups as in the pattern you tried:

(?<!\S)(keyword)[\t ]*\r?\n[\t ]*(\w+)(?!\S)

Explanation

  • (?<!\S) Negative lookbehind, assert what is directly on the left is not a non whitespace char
  • (keyword) Capture in group 1 matching the keyword
  • [\t ]* Match 0+ tabs or spaces
  • \r?\n Match newline
  • [\t ]* Match 0+ tabs or spaces
  • (\w+) Capture group 2 match 1+ word chars
  • (?!\S) Negative lookahead, assert what is directly on the right is not a non whitespace char

Regex demo | Python demo

For example:

import re

regex = r"(?<!\S)(keyword)[\t ]*\r?\n[\t ]*(\w+)(?!\S)"
test_str = (" This is my keyword\n"
    " value")

matches = re.search(regex, test_str)

if matches:
    print('{} {}'.format(matches.group(1), matches.group(2)))

Output

keyword value

Upvotes: 1

Emma
Emma

Reputation: 27723

My guess is that, depending of the number of new lines that you might have, an expression similar to:

\b(keyword1|keyword2|keyword3)\b[r\n]{1,2}(\S+)

might be somewhat close and the value is in \2, you can make the first group non-captured, then:

\b(?:keyword1|keyword2|keyword3)\b[r\n]{1,2}(\S+)

\1 is the value.


If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Upvotes: 0

Nick Reed
Nick Reed

Reputation: 5059

How about \b(keyword)\n(\w+)\b?

\b(keyword)\n(\w+)\b

\b                      get a word boundary
  (keyword)             capture keyword (replace with whatever you want)
           \n           match a newline
             (\w+)      capture some word characters, one or more
                  \b    get a word boundary

Because keyword and \w+ are in capture groups, you can reference them as you wish later in your code.

Try it here!

Upvotes: 0

Related Questions