Eduardo Wada
Eduardo Wada

Reputation: 2647

.Net Regex to replace duplicate occurrences of a pattern with capture group

I have a SQL script that goes something like this:

DECLARE @MyVariable1 = 1
DECLARE @MyVariable1 = 10
DECLARE @MyVariable3 = 15
DECLARE @MyVariable2 = 20
DECLARE @MyVariable1 = 7
DECLARE @MyVariable2 = 4
DECLARE @MyVariable4 = 7
DECLARE @MyVariable2 = 4

Of course, the real script has lots of other stuff in the middle but I want to write a function that given the above input, outputs this:

DECLARE @MyVariable1 = 1
@MyVariable1 = 10
DECLARE @MyVariable3 = 15
DECLARE @MyVariable2 = 20
@MyVariable1 = 7
@MyVariable2 = 4
DECLARE @MyVariable4 = 7
@MyVariable2 = 4

Essentially removing duplicate DECLARE statements for variables that have already been declared

My current solution is this:

    Private Function RemoveDuplicateDeclarations(commandText As String) As String
        Dim lines = commandText.Split(New String() { vbCrLf }, StringSplitOptions.RemoveEmptyEntries)
        Dim declarationRegex As New Regex("(\r|\n|\r\n) *DECLARE *(?<initialization>(?<varname>[^ ]*) *.*)" & vbCrLf , RegexOptions.Multiline Or RegexOptions.IgnoreCase)
        Dim declaredVariables As New List(Of String) 
        Dim resultBuilder As New StringBuilder()

        For Each line In lines    
            Dim matches = declarationRegex.Matches(line)
            If matches.Count > 0 Then
                Dim varname = matches(0).Groups("varname").Value
                If declaredVariables.Contains(varname) Then
                    resultBuilder.AppendLine(declarationRegex.Replace(line, "${initialization}"))
                Else 
                    declaredVariables.Add(varname)

                    resultBuilder.AppendLine(line)
                End If
            Else
                resultBuilder.AppendLine(line)
            End If
        Next

        Return resultBuilder.ToString()
    End Function

It worked perfectly for my scripts (and there won't be any new scripts), but it seems a bit over complicated, since I can match the occurrences of what I want to replace I was wondering if there would be a way to just run Regex.Replace() with some arguments and accomplish that in one line

C# solutions welcome

-EDIT-

To clarify what I'm trying to achieve, I want an answer in the following format, or an explanation that it's impossible (modifying the regex is allowed).

Private Function RemoveDuplicateDeclarations(commandText As String) As String
    Dim regex As New Regex("(\r|\n|\r\n) *DECLARE *(?<initialization>(?<varname>[^ ]*) *.*)" & vbCrLf , RegexOptions.Multiline Or RegexOptions.IgnoreCase)
    Return regex.Replace(commandText, "What do I put here???????")
End Function

Upvotes: 2

Views: 351

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626950

You may use rather a simple regex that will search for the duplicate @ prefixed words on the lines, and only keep the first occurrence in a loop, until there is no match.

(?sm)(^DECLARE\s+(@\w+\b).*?)^DECLARE\s+\2

Details:

  • (?sm) - enable MULTILINE and Singleline (DOTALL) modes
  • (^DECLARE\s+(@\w+\b).*?) - Group 1 capturing:
    • ^DECLARE - DECLARE at the start of a line
    • \s+ - 1 or more whitespace symbols
    • (@\w+\b) - Group 2 capturing @ and 1+ word chars up to the trailing word boundary
    • .*? - any 0+ chars, as few as possible, up to the first occurrence of...
  • ^DECLARE - a DECLARE substring at the beginning of a line
  • \s+ - 1+ whitespaces
  • \2 - a backreference to the value stored in Group 2

See the VB.NET demo:

Dim rx As Regex = New Regex("(?sm)(^DECLARE\s+(@\w+\b).*?)^DECLARE\s+\2")
Dim s As String = "DECLARE @MyVariable1 = 1" & vbCrLf & "DECLARE @MyVariable1 = 10" & vbCrLf & "DECLARE @MyVariable3 = 15" & vbCrLf & "DECLARE @MyVariable2 = 20" & vbCrLf & "DECLARE @MyVariable1 = 7" & vbCrLf & "DECLARE @MyVariable2 = 4" & vbCrLf & "DECLARE @MyVariable4 = 7" & vbCrLf & "DECLARE @MyVariable2 = 4"
Dim res As String
Dim tmp As String = s
res = rx.Replace(s, "$1$2")
While (String.Compare(tmp, res) <> 0)
    tmp = res
    res = rx.Replace(res, "$1$2")
End While
Console.WriteLine(res)

Output:

DECLARE @MyVariable1 = 1
@MyVariable1 = 10
DECLARE @MyVariable3 = 15
DECLARE @MyVariable2 = 20
@MyVariable1 = 7
@MyVariable2 = 4
DECLARE @MyVariable4 = 7
@MyVariable2 = 4

Upvotes: 1

jdweng
jdweng

Reputation: 34429

If you like a linq solution :

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string input =
                "DECLARE @MyVariable1 = 1\n" +
                "DECLARE @MyVariable1 = 10\n" +
                "DECLARE @MyVariable3 = 15\n" +
                "DECLARE @MyVariable2 = 20\n" +
                "DECLARE @MyVariable1 = 7\n" +
                "DECLARE @MyVariable2 = 4\n" +
                "DECLARE @MyVariable4 = 7\n" +
                "DECLARE @MyVariable2 = 4\n";

            string pattern = @"@(?'name'[^\s]+)\s+=\s+(?'value'\d+)";

            MatchCollection matches = Regex.Matches(input, pattern);

            string[] lines = matches.Cast<Match>()
                .Select((x, i) => new { name = x.Groups["name"].Value, value = x.Groups["value"].Value, index = i })
                .GroupBy(x => x.name)
                .Select(x => x.Select((y, i) =>  new { 
                    index = y.index,  
                    s = i == 0 
                       ? string.Format("DECLARE @{0} = {1}", x.Key, y.value)  
                       : string.Format("@{0} = {1}", x.Key, y.value) 
                }))
                .SelectMany(x => x)
                .OrderBy(x => x.index)
                .Select(x => x.s)
                .ToArray();

            foreach (string line in lines)
            {
                Console.WriteLine(line);
            }
            Console.ReadLine();

        }
    }
}

Upvotes: 0

Related Questions