jason
jason

Reputation: 4449

regex in google sheets, match previous period or semi-colon

I want the return to be the sentence with the double question mark.

I'm using the formula =trim(regexextract(H4,"[\.;](.*?)\?\?"))

ex 1:

blah blah. 1 The quick brown fox jumps over the lazy dog?? blah blah

I get the right return. 1 The quick brown fox jumps over the lazy dog

but ex 2:

blah blah. 1 The quick brown fox jumps over the lazy dog. blah blah ; 2 The quick brown fox jumps over the lazy dog?? blah blah, blah blah?

I get

1 The quick brown fox jumps over the lazy dog. blah blah ; 2 The quick brown fox jumps over the lazy dog

I only want

2 The quick brown fox jumps over the lazy dog

the sentence with the double question mark.

not sure how to make it to lookback from the question mark to the first period or semi-colon since RE2 doesn't support lookback.

Here is a sample https://docs.google.com/spreadsheets/d/1mPFiWZ4WCOSPoae-ga3TaJOFlIfPIp7ALf-4ht6xpFo/edit#gid=2100307022

Upvotes: 0

Views: 222

Answers (2)

doubleunary
doubleunary

Reputation: 19220

To start at the last digit and stop at the first non-alphanumeric, non-whitespace character, use .*(\d[\w\s]+), like this:

=arrayformula( iferror( regexextract(B5:B, ".*(\d[\w\s]+)") ) )

This formula exactly matches the desired results you present in the sample spreadsheet.

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163632

You might use:

[.;]\s+(.*?)[^\w\s]

Explanation

  • [.;] Match either . or ;
  • \s+ Match 1+ whitespace chars
  • (.*?) Capture group 1, match any char as few as possible
  • [^\w\s] Match a single char other than a word char or whitespace char

See the capture group matches at the regex101 demo.

Example code:

=trim(regexextract(H4,"[.;]\s+(.*?)[^\w\s]"))

Upvotes: 0

Related Questions