Reputation:
I have the following sample string ABC__hdsiugid_23123_FGH1_sdfkjk_FGH2
.
What I would like to do though is to capture both FGH1
and FGH2
while ensuring that my pattern starts with ABC
.
When I try the lazy pattern ABC.+?(FGH\d)
I get FGH1
and with the greedy pattern ABC.+(FGH\d)
I get FGH2
. How can modify the pattern to capture both FGH1
and FGH2
?
Sub RexTest()
Dim rex As New RegExp
rex.Pattern = "ABC.+?(FGH\d)" ' or "ABC.+(FGH\d)"
rex.Global = True
Dim str As String: str = "ABC__hdsiugid_23123_FGH1_sdfkjk_FGH2"
Dim mtch As Object
For Each mtch In rex.Execute(str)
Debug.Print mtch.SubMatches(0)
Next
End Sub
Edit: I have realized that I should have made my question clearer (thanks sln). In the sample string i gave there are only 2 FGH[0-9]'s but in reality there there could be an arbitrary number of them.
Upvotes: 2
Views: 186
Reputation:
You mentioned VSTO. If you can do that, you might be able to run a C# segment from vba.
How you marshal back the results is beyond me.
Anyway. here is a real simple regex sample that utilizes Capture Collections a feature that
should be in ALL engines, but only Dot-Net has it I guess.
Normally, the capture buffer is overwritten each time the Cluster group expression is run, but
MS just accumulates the result in an array.
Here it is ...
C# code
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Globalization;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Regex FghRx = new Regex(
@"
^ # Beginning of Line
ABC # Must be an 'ABC' at bol
(?: # START Cluster group
.*? # optional non-'FGH' (and not newlines)
( FGH \d+ ) # (1), The FGH Capture Collection
)+ # END Cluster group, do many times
"
, RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline);
string FghData =
"ABC__hdsiugid_23123_FGH10_sdfkjk_FGH20 \n" +
"ABC__hdsiugid_23123_FGH11_sdfkjk_FGH21_dopqw_FGH31 \n" +
"ABC__hdsiugid_23123_FGH12_sdfkjk_FGH22_dopqw_FGH32 \n" +
"333333__ABC__hdsiugid_23123_FGH120_sdfkjk_FGH220_dopqw_FGH320 \n" +
"ABC__hdsiugid_23123_FGH13_sdfkjk_FGH23_dopqw_FGH33_dopqw_FGH43 \n" +
"ABC__hdsiugid_23123_FGH14_sdfkjk_FGH24_dopqw_FGH34_dopqw_FGH44 \n" +
"333333__ABC__hdsiugid_23123_FGH121_sdfkjk_FGH221_dopqw_FGH321 \n" +
"ABC__hdsiugid_23123_FGH15_sdfkjk_FGH25_dopqw_FGH35 \n" +
"ABC__hdsiugid_23123_FGH16_sdfkjk_FGH26_dopqw_FGH36 \n" ;
Match FghMatch = FghRx.Match( FghData );
while ( FghMatch.Success )
{
Console.WriteLine( "New Record\n------------------------" );
CaptureCollection cc_fgh = FghMatch.Groups[1].Captures;
for (int i = 0; i < cc_fgh.Count; i++)
{
Console.WriteLine( "'{0}'", cc_fgh[i].Value );
}
FghMatch = FghMatch.NextMatch();
Console.WriteLine( "------------------------\n" );
}
return;
}
}
}
Output >>
New Record
------------------------
'FGH10'
'FGH20'
------------------------
New Record
------------------------
'FGH11'
'FGH21'
'FGH31'
------------------------
New Record
------------------------
'FGH12'
'FGH22'
'FGH32'
------------------------
New Record
------------------------
'FGH13'
'FGH23'
'FGH33'
'FGH43'
------------------------
New Record
------------------------
'FGH14'
'FGH24'
'FGH34'
'FGH44'
------------------------
New Record
------------------------
'FGH15'
'FGH25'
'FGH35'
------------------------
New Record
------------------------
'FGH16'
'FGH26'
'FGH36'
------------------------
Press any key to continue . . .
Upvotes: 0
Reputation: 31035
You can use a regex like this:
^(?:(?!ABC).)*|(FGH\d)
MATCH 1
1. [20-24] `FGH1`
MATCH 2
1. [32-36] `FGH2`
MATCH 3
1. [51-55] `FGH3`
MATCH 4
1. [80-84] `FGH4`
MATCH 5
1. [92-96] `FGH5`
MATCH 6
1. [117-121] `FGH6`
Upvotes: 1