Alex Smoke
Alex Smoke

Reputation: 659

Regex (PCRE) exclude certain words from match result

I need to get only the string with names that is in Bold:

author={Trainor, Sarah F and Calef, Monika and Natcher, David and Chapin, F Stuart and McGuire, A David and Huntington, Orville and Duffy, Paul and Rupp, T Scott and DeWilde, La'Ona and Kwart, Mary and others},

Is there a way to skip all 'and' 'others' words from match result?

Tried to do lots of things, but nothing works as i expect

(?<=\{).+?(?<=and\s).+(?=\})

Upvotes: 1

Views: 78

Answers (2)

The fourth bird
The fourth bird

Reputation: 163207

You could make use of \G and a capturing group to get you the matches.

The values are in capturing group 1.

(?:author={|\G(?!^))([^\s,]+,(?:\h+[^\s,]+)+)\h+and\h+(?=[^{}]*\})

About the pattern

  • (?: Non capturing group
    • author={ Match literally
    • | Or
    • \G(?!^) Assert position at the end of previous match, not at the start
  • ) Close non capturing group
  • ( Capture group 1
    • [^\s,]+, Match not a whitespace char or comma, then match a comma
    • (?:\h+[^\s,]+)+ Repeat 1+ times matching 1+ horizontal whitespace chars followed by matching any char except a whitespace char and a comma
  • ) Close group 1
  • \h+and\h+ Match and between 1+ horizontal whitespaces
  • (?=[^{}]*\}) Assert what is on the right is a closing }

Regex demo

Upvotes: 0

MonkeyZeus
MonkeyZeus

Reputation: 20737

Instead of using omission, you could be better off by implementing rules which expect a specific format in order to match the examples you've provided:

([A-Z]+[A-Za-z]*('[A-Za-z]+)*, [A-Z]? ?[A-Z]+[A-Za-z]*('[A-Za-z]+)*( [A-Z])?)

https://regex101.com/r/9LGqn3/3

Upvotes: 1

Related Questions