Reputation: 13
Firstly, I'm very new to Regex so my apologies if this is a dumb question.
I'm just using an online Regex tester https://regex101.com (PCRE) to build the following scenario.
I want to capture 123445
and ABC1234
from the following sentence
Foo Bar 123445 Ref ABC1234
I just wanted to use a simple capturing group
((?:\w)+)
Which will identify 5 matching groups And then I could back reference it with $3
and $5
However when I attempt using Substitution with just one group, $3
, I end up with the whole string. I tried some of the other languages and ended up with
$3 $3 $3 $3 $3
In the end I just used Foo\s*Bar\s*(\w+)\s*Ref\s*(\w+)
and referencing groups $1
and $2
which works fine but just isn't very elegant.
Is it possible to create this kind of back referencing without specifically building capturing groups around each part of what you are trying to capture?
Thanks :)
Upvotes: 1
Views: 3916
Reputation: 7780
It isn't entirely clear what you are trying to match and what you want to substitute with based on your question.
For the purpose of trying to get an answer for you, I'm going to assume that you want to match any word that has a number and replace it with something else.
\w*?\d+\w*?
will match any word with a digit in it, and with JavaScript (you didn't specify a language), you perform a manual substitution, or a dynamic one with a replacer function.
const expression = /\b(\w*?\d+\w*?)\b/g;
const inputs = [
'Foo Bar 123445 Ref ABC1234',
'Hello World 123 Foo ABC123XYZ456'
];
// static string
console.log(inputs.map(i => i.replace(expression, '**redacted**')));
// dynamic string
console.log(inputs.map(i => i.replace(expression, s => new Array(s.length).fill('*').join(''))));
Upvotes: 0
Reputation: 338128
((?:\w)+)
Which will identify 5 matching groups And then I could back reference it with $3 and $5
No, that's not how backreferences work. There are exactly N groups in a regex, and N is the number of opening parenthesis.
In ((?:\w)+)
there are 2 groups, one "capturing" (which creates a backreference) and one "non-capturing" (which does not).
The number of times a group matches in a target string does not change the number of backreferences. Imagine the chaos this would create. Except for the most simplistic cases, how would you even know if what you're looking for is $3
, $9
or $9000
?
If your input string has a fixed structure, then your approach Foo\s*Bar\s*(\w+)\s*Ref\s*(\w+)
with $1
and $2
is perfectly fine.
Is it possible to create this kind of back referencing without specifically building capturing groups around each part of what you are trying to capture?
No. You must build one capturing group for each part that you are trying to backreference to. If a group matches multiple times, you will get the last instance of each match in the input.
Some regex engines let you to access each instance of what a particular group has captured from the host language. For example the .NET regex engine does that. This is nice for post-processing, but the backreferences themselves (i.e. the $1
) still work as above.
All that being said, the way to get '123445'
and 'ABC1234'
out of Foo Bar 123445 Ref ABC1234
in the way you were thinking of is to avoid regex and string.split()
at the space, taking parts 2 and 3.
Upvotes: 2