user1704812
user1704812

Reputation:

Multi-matches with Regular Expressions

I have the following sample string ABC__hdsiugid_23123_FGH1_sdfkjk_FGH2.

What I would like to do though is to capture both FGH1 and FGH2 while ensuring that my pattern starts with ABC.

When I try the lazy pattern ABC.+?(FGH\d) I get FGH1 and with the greedy pattern ABC.+(FGH\d) I get FGH2. How can modify the pattern to capture both FGH1 and FGH2?

Sub RexTest()
    Dim rex As New RegExp
    rex.Pattern = "ABC.+?(FGH\d)" ' or "ABC.+(FGH\d)"
    rex.Global = True
    Dim str As String: str = "ABC__hdsiugid_23123_FGH1_sdfkjk_FGH2"
    Dim mtch As Object
    For Each mtch In rex.Execute(str)
        Debug.Print mtch.SubMatches(0)
    Next
End Sub

Edit: I have realized that I should have made my question clearer (thanks sln). In the sample string i gave there are only 2 FGH[0-9]'s but in reality there there could be an arbitrary number of them.

Upvotes: 2

Views: 186

Answers (2)

user557597
user557597

Reputation:

You mentioned VSTO. If you can do that, you might be able to run a C# segment from vba.
How you marshal back the results is beyond me.

Anyway. here is a real simple regex sample that utilizes Capture Collections a feature that
should be in ALL engines, but only Dot-Net has it I guess.

Normally, the capture buffer is overwritten each time the Cluster group expression is run, but
MS just accumulates the result in an array.

Here it is ...

C# code

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Globalization;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            Regex FghRx = new Regex(
            @"
                 ^                   # Beginning of Line
                 ABC                 # Must be an 'ABC' at bol
                 (?:                 # START Cluster group
                      .*?                 # optional non-'FGH' (and not newlines)
                      ( FGH \d+ )         # (1), The FGH Capture Collection
                 )+                  # END Cluster group, do many times
            "
            , RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline);

            string FghData =
                "ABC__hdsiugid_23123_FGH10_sdfkjk_FGH20                             \n" +
                "ABC__hdsiugid_23123_FGH11_sdfkjk_FGH21_dopqw_FGH31                 \n" +
                "ABC__hdsiugid_23123_FGH12_sdfkjk_FGH22_dopqw_FGH32                 \n" +
                "333333__ABC__hdsiugid_23123_FGH120_sdfkjk_FGH220_dopqw_FGH320      \n" +
                "ABC__hdsiugid_23123_FGH13_sdfkjk_FGH23_dopqw_FGH33_dopqw_FGH43     \n" +
                "ABC__hdsiugid_23123_FGH14_sdfkjk_FGH24_dopqw_FGH34_dopqw_FGH44     \n" +
                "333333__ABC__hdsiugid_23123_FGH121_sdfkjk_FGH221_dopqw_FGH321      \n" +
                "ABC__hdsiugid_23123_FGH15_sdfkjk_FGH25_dopqw_FGH35                 \n" +
                "ABC__hdsiugid_23123_FGH16_sdfkjk_FGH26_dopqw_FGH36                 \n" ; 

            Match FghMatch = FghRx.Match( FghData );
            while ( FghMatch.Success )
            {
                Console.WriteLine( "New Record\n------------------------" );
                CaptureCollection cc_fgh = FghMatch.Groups[1].Captures;

                for (int i = 0; i < cc_fgh.Count; i++)
                {
                    Console.WriteLine( "'{0}'", cc_fgh[i].Value );
                }
                FghMatch = FghMatch.NextMatch();
                Console.WriteLine( "------------------------\n" );
            }
            return;

        }
    }
}

Output >>

New Record
------------------------
'FGH10'
'FGH20'
------------------------

New Record
------------------------
'FGH11'
'FGH21'
'FGH31'
------------------------

New Record
------------------------
'FGH12'
'FGH22'
'FGH32'
------------------------

New Record
------------------------
'FGH13'
'FGH23'
'FGH33'
'FGH43'
------------------------

New Record
------------------------
'FGH14'
'FGH24'
'FGH34'
'FGH44'
------------------------

New Record
------------------------
'FGH15'
'FGH25'
'FGH35'
------------------------

New Record
------------------------
'FGH16'
'FGH26'
'FGH36'
------------------------

Press any key to continue . . .

Upvotes: 0

Federico Piazza
Federico Piazza

Reputation: 31035

You can use a regex like this:

^(?:(?!ABC).)*|(FGH\d)

Working demo

enter image description here

MATCH 1
1.  [20-24] `FGH1`
MATCH 2
1.  [32-36] `FGH2`
MATCH 3
1.  [51-55] `FGH3`
MATCH 4
1.  [80-84] `FGH4`
MATCH 5
1.  [92-96] `FGH5`
MATCH 6
1.  [117-121]   `FGH6`

Upvotes: 1

Related Questions