Alan
Alan

Reputation: 23

Regex to match nth occurrence and return part of the next line of a string

I am using Zapier to extract names from the body of an email and I need to use a Regex to return nth occurrence of a name in the string for each step of the Zap.

Here is the DEMO I am working with

For example for the 2nd part of the Zap I want to return 'John Morlu, CPA, CFE, CIA, CMA, CGFM, PMP, CISA' from the string.

The names will be different each time the Regex is run, so the Regex must return the data that sits between each instance of '.....................................' and ' ('

So far I have:

(?mis)\A(?:.?^.[ ]+){2}(.?)(?=[ ]*()

but I am stuck

How do I adjust the code to return the 1st, 2nd, or 3rd name in the list?

Thanks!

Upvotes: 0

Views: 188

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110665

To obtain the information of interest for the 3rd person on the list you can use the following regex:

(?:(?:^.*\s)*?\.{20}\.*\s+){3}\K.[^\)]*\)

As the regex engine was not specified, I used PCRE (PHP).

Demo

I assumed the dividing lines of periods (37 per line in the example) contain at least 20 periods.

The regex performs the following operations.

(?:             # begin a non-cap group
  (?:^.*\s)     # match entire line, incl. newline, in a non-cap group
  *?            # execute the non-cap group 0+ times, non-greedily
  \.{20}\.*\s+  # match entire line of 20+ periods, incl.
                #   the newline, followed by 0+ empty lines
)               # end the non-cap group
{3}             # execute the non-cap group 3 times
\K              # forget everything matched so far
[^\)]*          # match 0+ chars other than ')'
\)              # match ')'

Upvotes: 0

Grismar
Grismar

Reputation: 31319

If you need the n-th match only (instead of a list of matches to select the n-th from), this gets the first match:

(?:\.{37}\s+.+?\(.*?){0}\.{37}\s+(.+?)\(

This gets the second:

(?:\.{37}\s+.+?\(.*?){1}\.{37}\s+(.+?)\(

And the third:

(?:\.{37}\s+.+?\(.*?){2}\.{37}\s+(.+?)\(

Etc.

Basic explanation: - the first half, starting with ?: is a non-capturing group; it needs to be matched, but it's not part of what's captured. - the {x} behind it causes it to be matched x times, so to match the first, that's 0 times, to match the second 1 times, etc. - the rest of the expression matches the same again, but now it's group 1 you're after.

To just get a list to select from:

\.{37}\s+(.+?)\(

Upvotes: 2

Related Questions