Reputation: 480
I think an image a better than words sometimes.
My problem as you can see, is that It only matches two words by two. How can I match all of the words ?
My current regex (PCRE) : ([^\|\(\)\|]+)\|([^\|\(\)\|]+)
The goal : retrieve all the words in a separate groupe for each of them
Upvotes: 2
Views: 414
Reputation: 163277
In c# you can also make use of the group captures using a capture group.
The matches are in named group word
\((?<word>\w+)(?:\|(?<word>\w+))*\)
\(
Match (
(?<word>\w+)
Match 1+ word chars in group word
(?:
Non capture group
\|
Match |
(?<word>\w+)
Match 1+ word chars)*
Close the non capture group and optionally repeat to get all occurrences\)
Match the closing parenthesisCode example provided by Wiktor Stribiżew in the comments:
var line = "I love (chocolate|fish|honey|more)";
var output = Regex.Matches(line, @"\((?<word>\w+)(?:\|(?<word>\w+))*\)")
.Cast<Match>()
.SelectMany(x => x.Groups["word"].Captures);
foreach (var s in output)
Console.WriteLine(s);
Output
chocolate
fish
honey
more
foreach (var s in output) Console.WriteLine(s);
Upvotes: 1
Reputation: 626748
You can use an infinite length lookbehind in C# (with a lookahead):
(?<=\([^()]*)\w+(?=[^()]*\))
To match any kind of strings inside parentheses, that do not consist of (
, )
and |
, you will need to replace \w+
with [^()|]+
:
(?<=\([^()]*)[^()|]+(?=[^()]*\))
// ^^^^^^
See the regex demo (and regex demo #2). Details:
(?<=\([^()]*)
- a positive lookbehind that matches a location that is immediately preceded with (
and then zero or more chars other than (
and )
\w+
- one or more word chars(?=[^()]*\))
- a positive lookahead that matches a location that is immediately followed with zero or more chars other than (
and )
and then a )
char.Another way to capture these words is by using
(?:\G(?!^)\||\()(\w+)(?=[^()]*\)) // words as units consisting of letters/digits/diacritics/connector punctuation
(?:\G(?!^)\||\()([^()|]+)(?=[^()]*\)) // "words" that consist of any chars other than (, ) and |
See this regex demo. The words you need are now in Group 1. Details:
(?:\G(?!^)\||\()
- a position after the previous match (\G(?!^)
) and a |
char (\|
), or (|
) a (
char (\(
)(\w+)
- Group 1: one or more word chars(?=[^()]*\))
- a positive lookahead that makes sure there is a )
char after any zero or more chars other than (
and )
to the right of the current position.Extracting the matches in C# can be done with
var matches = Regex.Matches(text, @"(?<=\([^()]*)\w+(?=[^()]*\))")
.Cast<Match>()
.Select(x => x.Value);
// Or
var matches = Regex.Matches(text, @"(?:\G(?!^)\||\()(\w+)(?=[^()]*\))")
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Upvotes: 5