Kiel Pagtama
Kiel Pagtama

Reputation: 55

How to use REGEXTRACT to extract certain characters between two strings

I am trying to extract a person's name between different characters. For example, the cells contains this information

PATIENT: 2029985 - COLLINS, JUNIOR .
PATIENT: 1235231-02 - JERRY JR, PATRICK .
PATIENT: 986435--EXP-- - JULIUS, DANIEL .
PATIENT: 2021118-02 - DRED-HARRY, KEVIN .

My goal is to use one REGEXTRACT formula to get the following:

COLLINS, JUNIOR
JERRY JR, PATRICK
JULIUS, DANIEL
LOVE ALSTON, BRENDA

So far, I have come up with the formula:

=ARRAYFORMULA(REGEXEXTRACT(B3:B, "-(.*)\."))

Where B3 contains the first information

Using that formula, I get:

COLLINS, JUNIOR
02 - JERRY JR, PATRICK
02 - LOVE-ALSTON, BRENDA 
-EXP-- - JULIUS, DANIEL
02 - DRED-HARRY, KEVIN

I managed to get the first name down but how do I go about extracting the rest.

Upvotes: 2

Views: 817

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133538

1st solution: With your shown samples, please try following regex.

Online demo for above regex

^PATIENT:.*-\s+([^.]*?)\s*\.

OR try following Google-sheet forumla:

=ARRAYFORMULA(REGEXEXTRACT(B3:B, "^PATIENT:.*-\s+([^.]*?)\s*\."))

Explanation: Checking if line/value starts from PATIENT followed by : till -(using greedy mechanism here), which is followed by spaces(1 or more occurrences). Then creating one and only capturing group which contains everything just before .(dot) in it making it non-greedy, closing capturing group which is followed by spaces(0 or more occurrences) followed by a literal dot.



2nd solution: Using lazy match approach in regex, please try following regex.

.*?\s-\s([^.]*?)\s*\.

Google-sheet formula will be as follows:

=ARRAYFORMULA(REGEXEXTRACT(B3:B, ".*?\s-\s([^.]*?)\s*\."))

Online demo for above regex

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626926

You can use

=ARRAYFORMULA(REGEXEXTRACT(B3:B, "\s-\s+([^.]*?)\s*\."))

See the regex demo. Details:

  • \s-\s+ - a whitespace, -, one or more whitespaces
  • ([^.]*?) - Group 1: zero or more chars other than a . as few as possible
  • \s* - zero or more whitespaces
  • \. - a . char.

Upvotes: 1

Related Questions