Chipmunkafy
Chipmunkafy

Reputation: 586

Regex: Output in between two specific words

Text:

ITEM 1A.    RISK FACTORS 

    The following is a description of the principal risks inherent in our business.

ITEM 1B.    UNRESOLVED STAFF COMMENTS 

    Not Applicable.

Regex:

(?<=RISK).*

Got this:

ITEM 1A.    RISK **FACTORS** 

    The following is a description of the principal risks inherent in our business.

ITEM 1B.    UNRESOLVED STAFF COMMENTS 

    Not Applicable.

Expected:

ITEM 1A.    RISK **FACTORS

    The following is a description of the principal risks inherent in our business.

ITEM 1B.    UNRESOLVED STAFF COMMENTS 

    Not Applicable.**

How can I get all text after the word RISK and before the word ITEM 1B

Upvotes: 0

Views: 29

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520988

The following pattern should work:

(?<=RISK)(.*?)(?=ITEM 1B)

Note carefully that in the demo below I am using DOT ALL mode. This means that .* can match across newlines, which is the behavior you want here.

Demo

If you can't use lookarounds for some reason, we may still be able to proceed assuming your regex tool supports capture groups.

If your regex flavor does not support DOT ALL, then one possible workaround is to use [\s\S]*:

(?<=RISK)([\s\S]*?)(?=ITEM 1B)

Upvotes: 1

glhr
glhr

Reputation: 4537

You can do this, which doesn't require using the s (dot all) RegEx modifier:

(?<=RISK)([\W\w]*)(?=ITEM 1B)

Demo here: https://regex101.com/r/ZUKZxy/4

Upvotes: 0

Related Questions