Pankaj Jaju
Pankaj Jaju

Reputation: 5481

Replace all text in text file using regular expression

I have a text file with following text

161624.406 : Send:[sometext1]
161624.437 : Send:[sometext2]
161624.468 : Send:[sometext3]
161624.499 : Send:[sometext4]
161624.531 : Send:[sometext5]

I want to keep only the sometext part in that file. Desired output is

sometext1
sometext2
sometext3
sometext4
sometext5

I am using the following code in Excel-VBA

Public Sub testa()
    a = "C:\Users\pankaj.jaju\Desktop\test.log"

    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objTxtFile = objFSO.OpenTextFile(a, 1)
    strText = objTxtFile.ReadAll
    objTxtFile.Close
    Set objTxtFile = Nothing


    Set objRegEx = CreateObject("VBScript.RegExp")
    With objRegEx
        .Global = True
        .MultiLine = True
        .Pattern = "\[([^]]+)\]"
        Set objRegMC = .Execute(strText)
        b = objRegMC(0).SubMatches(0)
    End With
    Set objRegEx = Nothing

    Debug.Print b
End Sub

The problem is the output is displayed as sometext1 only. How do I ReplaceAll in the text file and save the file with the desired text only.

Upvotes: 5

Views: 4496

Answers (4)

Rich
Rich

Reputation: 4170

I know you've already got your answer. But for a simple program like this, why even utilize regular expressions? Here's a 4 line script that can do the same thing - yes it has been tested.

Dim a, text: a = "C:\testenv\test.log"
text = split(CreateObject("Scripting.FileSystemObject").OpenTextFile(a, 1).ReadAll, vbcrlf)
for i=0 to ubound(text) : text(i) = replace(split(text(i), "[")(1), "]", "") : next
CreateObject("Scripting.FileSystemObject").OpenTextFile(a, 2).Write(Join(text, vbcrlf))

Upvotes: 2

Pankaj Jaju
Pankaj Jaju

Reputation: 5481

OK ... found a way to do it (Thanks MSDN ... sometimes the docs are useful :-)). Sharing the answer

Public Sub testa()
    a = "C:\Users\pankaj.jaju\Desktop\test.log"

    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objTxtFile = objFSO.OpenTextFile(a, 1)
    strText = objTxtFile.ReadAll
    objTxtFile.Close
    Set objTxtFile = Nothing

    b = ""
    Set objRegEx = CreateObject("VBScript.RegExp")
    With objRegEx
        .Global = True
        .MultiLine = True
        .Pattern = "\[([^]]+)\]"
        Set objMatches = .Execute(strText)

        For Each Match In objMatches
            For Each SubMatch In Match.SubMatches
                b = b & SubMatch & vbCrLf
            Next SubMatch
        Next Match

    End With
    Set objRegEx = Nothing

    Set objTxtFile = objFSO.OpenTextFile(a, 2)
    objTxtFile.Write b
    objTxtFile.Close
    Set objTxtFile = Nothing
End Sub


EDIT - Based on @PortlandRunner's suggestion

Public Sub testa()
a = "C:\Users\pankaj.jaju\Desktop\test.log"

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTxtFile = objFSO.OpenTextFile(a, 1)
strText = objTxtFile.ReadAll
objTxtFile.Close
Set objTxtFile = Nothing

b = ""
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
    .Global = True
    .MultiLine = True
    .Pattern = "(.*?)(\[)([^]]+)(\])(.*?)"
    b = objRegEx.Replace(strText, "$3")
End With
Set objRegEx = Nothing

Set objTxtFile = objFSO.OpenTextFile(a, 2)
objTxtFile.Write b
objTxtFile.Close
Set objTxtFile = Nothing
End Sub

Upvotes: 0

Automate This
Automate This

Reputation: 31394

The regex.Replace method should do the trick.

Separate your pattern into groups like this: "(.*?)(\[)([^]]+)(\])(.*?)"

And now you can replace your input string with the matching group which is group three in this case: objRegEx.Replace(strText, "$3")


Here is a helpful link to different examples of Regex within Excel.

Upvotes: 3

ashareef
ashareef

Reputation: 1846

The regex obviously only grabbing the first match of the string. I'm not proficient enough with regex to come up with a regex only solution that takes into account \n

A simple workaround would be to use objTxtFile.ReadLine instead of ReadAll

Read each string in one by one, apply regex and get output.

Public Sub testa()
    a = "C:\Users\pankaj.jaju\Desktop\test.log"

    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objTxtFile = objFSO.OpenTextFile(a, 1)

    Do Until (objTxtFile.AtEndOfStream) 'Loop till EOF
        strText = objTxtFile.ReadLine 'Single line read instead of ReadAll

        Set objRegex = CreateObject("VBScript.RegExp")
        With objRegex
            .Global = True
            .MultiLine = True
            .Pattern = "\[([^]]+)\]"
            Set objRegMC = .Execute(strText)
            b = objRegMC(0).SubMatches(0)
        End With
        Set objRegex = Nothing

        Debug.Print b 'replace this with the code to output the string where you want etc
    Loop
    objTxtFile.Close
End Sub

Upvotes: 0

Related Questions