Reputation: 1223
This is the PCRE2 regexp:
(?<=hello )(?:[^_]\w++)++
It's intended use is against strings like the following:
Hello Bob (Marius) Smith. -> Match "Bob"
Hello Bob Jr. (Joseph) White -> Match "Bob Jr."
Hello Bob Jr. IInd (Paul) Jobs -> Match "Bob Jr. IInd"
You get the point.
Essentially there is a magic word, in this case "hello", followed by a first name, followed by a second name which is always between parens. First names could be anything really. A single word, a list of words followed by punctuation, and so on. Heck, look at Elon Musks' kids' name (X Æ A-Xii) to see how weird names can get :)
Let's only assume ascii, though. Æ is not in my targets :)
I'm at a loss on how to convert this Regexp to JS, and the only viable solution I found was to use PCRE2-wasm on node which spins up a wasm virtual machine and sucks up 1gb of resources just for that. That's insane.
Upvotes: 2
Views: 1461
Reputation: 163287
The ++
does not work as Javascript does not support possessive quantifiers.
As a first name, followed by a second name which is always between parens, you might also use a capture group with a match instead of a lookbehind.
\b[Hh]ello (\w+.*?)\s*\([^()\s]+\)
\b[Hh]ello
Match hello
or Hello
(
Capture group 1
\w.*?
Match 1+ word chars followed by any char as least as possible)
Close group 1\s*\([^()\s]*\)
Match optional whitespace char followed by (
till )
const regex = /\b[Hh]ello (\w+.*?)\s*\([^()\s]+\)/;
["Hello Bob (Marius) Smith.",
"Hello Bob Jr. (Joseph) White",
"Hello Bob Jr. IInd (Paul) Jobs"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m[1]);
}
})
With the lookbehind, you might also match word characters followed by an optionally repeated capture group matching whitspace chars followed by word characters or a dot.
(?<=[Hh]ello )\w+(?:\s+[\w.]+)*
Upvotes: 0
Reputation: 7179
@Nils has the correct answer.
If you do need to expand your acceptable character set, you can use the following regex. Check it out. The g
, m
, and i
flags are set.
(?<=hello ).*(?=\([^\)]*?\))
Hello Bob (Marius) Smith. Hello Bob Jr. (Joseph) White Hello Bob Jr. IInd (Paul) Jobs Hello X Æ A-Xii (Not Elon) Musk Hello Bob ()) Jr. ( (Darrell) Black
Match Number Characters Matched Text Match 1 6-10 Bob Match 2 32-40 Bob Jr. Match 3 61-74 Bob Jr. IInd Match 4 92-102 X Æ A-Xii Match 5 124-138 Bob ()) Jr. (
The idea is pretty simple:
(?<=hello )
.(?=\([^\)]*?\))
(anything inside a set of parenthesis that is not a closing parenthesis, lazily so you don't take part of the first name)..*
.Upvotes: 1
Reputation: 3001
This would match your cases in ECMAscript.
(?<=[Hh]ello )(?:[^_][\w.]+)+
You need to look for a capital H done by looking for [Hh]
instead of h
, as your testcases starts with a capital H
and your + needs to be single to be used in ECMAscript.
also you need to include a .
with the \w
since it is included in some names.
https://regex101.com/r/lkZK7w/1
-- thanks "D M" for pointing out the missing .
in the testcase.
Upvotes: 3