Miles Works
Miles Works

Reputation: 639

REGEX: \s\d{3,4}\b\n\n?

[version of regex - ICU via TextSoap 8 for the Mac by Unmarked Software]

In the example below I need to capture a line of text like this:

Today's XXXX ZZZZZZZ ###/#

Some paragraph of Txt......????

So the XXXX and ZZZZZZZ are words, and the #### are numbers. Note that there are two lines there, one a new line after the "todays..." line and then a blank line. And then a paragraph of text. It's actually the paragraph of text that I am interested in. I want to set up my REGEX to do 2 things. One is to capture the digits, which is what it's doing now....perfectly. The second thing is to capture the text afterwards to justify the text. However, I can't figure out what I have to do it to get to the last \n where the "???" are in the text above.

Any suggestions ?

Here's an example string....

https://regex101.com/r/cN3kZ7/3

Upvotes: 1

Views: 111

Answers (3)

Jan
Jan

Reputation: 43169

Something like the following?

(?:^Today)\D*(?<numbers>\d+)(?:.*\R){2}(?<text>.*)
# look for Today at the beginning of the string/line in multiline mode
# match any non-digits
# capture numbers into the group "numbers"
# match .*\R two times - this is two lines including the newline character
# capture the text into the group "text"

See a demo on regex101.com. Obviously, you could as well leave the number and text part (and use $1 and $2 accordingly):

(?:^Today)\D*(\d+)(?:.*\R){2}(.*)

This will capture the text into the group $2.
If you want all the text (including other lines), you'd need some inline modifiers ((?s) and (?s-) in this case), a lazy quantifier and a stop word:

(?:^Today)\D*(\d+)(?:.*\R){2}(?s)(.*?(?=stop))(?s-)
# the same as above
# turn on single-line mode (?s) - the dot matches newline characters as well
# capture everything lazily (!) until 
# the positive lookahead finds "stop" literally
# turn off the single line mode afterwards - (?s-)

See an example for this approach here.

EDIT: In the end we used the following regex (see comments below):

^\h+\D+(\d+)(?:.*\R){2}(.+)

Upvotes: 3

Simon McClive
Simon McClive

Reputation: 2616

Something like ^Today\'s\s.+\s(\d+)\/(\d).*\n(.*)

Upvotes: 0

Jeremy Fortune
Jeremy Fortune

Reputation: 2499

It sounds like you just need to enable a multi-line tag.

/\s\d{3,4}\b\n.*\?{3}/gm

Regexer example. You'll probably want to put capture groups around the decimals and text, like so:

/\s(\d{3,4})\b\n(.*)\?{3}/gm

Upvotes: 0

Related Questions