Valentin Kuzub
Valentin Kuzub

Reputation: 12093

Regex ignore middle part of capture

I want a single regex that when applied to : "firstsecondthird" will match "firstthird" (in single group, ie in C# Match.Value will be equal to "firstthird").

Is that possible? we can ignore suffix or prefix , but middle?

Upvotes: 14

Views: 14001

Answers (4)

Burke Johnson
Burke Johnson

Reputation: 3

I know this question was asked several years ago at this point, but for the sake of anyone who is still coming here looking for the answer, there is a way, not like any of the other answers, that will exclude a part in the middle with only one expression.

The trick is to use 'non-capturing groups'. This feature allows one to search using an expression that includes a group that is not included in the result.

The syntax of this is as follows:

(?:Groups Contents)

This will be matched with the rest of the expression, but that group is excluded from the returned result.

e.g. If you apply the following expression to a list of names separated by newlines,

\w{2,} (?:Micheal |James )\w{2,}

It will match every person with the middle names of 'Micheal' or 'James', but only return their first and last name.

In the following list, matches are italicized, returned text is in bold:

  • Duke Jamesson

  • Bob James

  • Bob Micheal Jones

  • James Anderson

  • Joseph Micheal Hetton

  • Bill James Johnson

  • George Ronald McCarthy

Upvotes: -3

tripleee
tripleee

Reputation: 189487

No, there is no facility to make a single match group containing non-contiguous text from the target string. You will need to use replace, or glue together the matching groups into a new string.

Upvotes: 9

Code Jockey
Code Jockey

Reputation: 6721

AFAIK, it is not possible to do with a single regular expression. You will have to use a call to replace(); as follows:

String inputVar = "firstsecondthird";
String resultVar = Regex.replace(inputVar, "^(first)second(third)$", "$1$2");

which can (typically...) be inserted into an expression as necessary

Upvotes: 2

duncan
duncan

Reputation: 31912

match a string that starts with 'first', has zero or more other characters, then ends with 'third'. Is that what you mean?

"^first(.*)third$"

Or, do you mean if you find a string 'firstsecondthird', ditch everything apart from 'first' and 'third'?

replace("^(first)second(third)$", "$1$2")

Upvotes: 9

Related Questions