jomegaA
jomegaA

Reputation: 133

Regular expression to include starting string

I have managed to match into groups as follows using the below expression but its incomplete.

\([^\)]*\)

Example strings are,

s11(h 1 1 c)(h 1 1 c) x="" y="" z="" phi="" theta=""

e(45,10,h 1 1 c,1,cross,max) x="" y="" z="" phi="" theta=""

With the above expression I can match (h 1 1 c)(h 1 1 c) and (45,10,h 1 1 c,1,cross,max)

But I want to capture the starting string s11 and e along with (h 1 1 c)(h 1 1 c) and (45,10,h 1 1 c,1,cross,max)

Upvotes: 1

Views: 70

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

You can use

var lines = new List<string> { "s11(h 1 1 c)(h 1 1 c) x=\"\" y=\"\" z=\"\" phi=\"\" theta=\"\"",
"e(45,10,h 1 1 c,1,cross,max) x=\"\" y=\"\" z=\"\" phi=\"\" theta=\"\""};
foreach (var s in lines)
{
    Console.WriteLine("==== Next string: \"" + s + "\" =>");
    Console.WriteLine(string.Join(", ",
            Regex.Matches(s, @"\w+(?:\([^()]*\))+").Cast<Match>().Select(x => x.Value)));
            
    Console.WriteLine("=== With groups and captures:");
    var results = Regex.Matches(s, @"(\w+)(?:(\([^()]*\)))+");
    foreach (Match m in results)
    {
        Console.WriteLine(m.Groups[1].Value);
        Console.WriteLine(string.Join(", ", m.Groups[2].Captures.Cast<Capture>().Select(z => z.Value)));
    }
}

See the C# demo. Output:

==== Next string: "s11(h 1 1 c)(h 1 1 c) x="" y="" z="" phi="" theta=""" =>
s11(h 1 1 c)(h 1 1 c)
=== With groups and captures:
s11
(h 1 1 c), (h 1 1 c)
==== Next string: "e(45,10,h 1 1 c,1,cross,max) x="" y="" z="" phi="" theta=""" =>
e(45,10,h 1 1 c,1,cross,max)
=== With groups and captures:
e
(45,10,h 1 1 c,1,cross,max)

Depending on what exact results you want to get, you may use a regex with or without capturing groups:

\w+(?:\([^()]*\))+
(\w+)(?:(\([^()]*\)))+

See the regex 1 demo and regex 2 demo.

Details

  • \w+ - one or more word chars (letters, digits and some connector puncutation)
  • (?:\([^()]*\))+ - one or more repetitions of
    • \( - a ( char
    • [^()]* - zero or more chars other than ( and )
  • \) - a ) char.

Upvotes: 1

Related Questions