ar.dll
ar.dll

Reputation: 787

vbscript regex replace headache

I have a text file I'm trying to process with vbscript, it looks like this:

111 ,   ,       ,Yes    ,Yes
222 ,   ,       ,Yes    ,Yes
333 ,   ,       ,Yes    ,Yes
444 ,   ,       ,Yes    ,Yes
555 ,   ,       ,Yes    ,Yes
666 ,   ,       ,Yes    ,Yes

What I want is to remove the carriage returns and tabs, commas and 'yes' (or the regex "\t,\t,\t\t,Yes\t,Yes") to give this output:

('111','222','333','444','555','666')

I'm using this code:

Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(filePath, ForReading)

strText = objFile.ReadAll
objFile.Close
'chr(010) = line feed chr(013) = carriage return
strNewText = Replace(strText, "\t,\t,\t\t,Yes\t,Yes" & chr(013) & chr(010), "','") 

Set objFile = objFSO.OpenTextFile(filePath, ForWriting)
objFile.WriteLine strNewText
objFile.Close

This isn't giving the desired output however, If I take the ""\t,\t,\t\t,Yes\t,Yes" &" out of the replace it removes the carriage returns, which is fine but I also need the commas tabs and 'yes' removed, as well as having a (' at the start and ') at the end. I'm guessing it's the way I've used the regex but I've not used much vbscript so I'm not sure

Upvotes: 0

Views: 2783

Answers (2)

Ekkehard.Horner
Ekkehard.Horner

Reputation: 38745

Instead of hunting down what you don't want, it's easier and less errorprone to concentrate on what you want:

  Dim sExp   : sExp   = "('111','222','333','444','555','666')"
  Dim aLines : aLines = Array( _
      "111 ,   ,       ,Yes    ,Yes" _
    , "222 ,   ,       ,Yes    ,Yes" _
    , "333 ,   ,       ,Yes    ,Yes" _
    , "444 ,   ,       ,Yes    ,Yes" _
    , "555 ,   ,       ,Yes    ,Yes" _
    , "666 ,   ,       ,Yes    ,Yes" _
  )     
  Dim sAll : sAll = Join( aLines, vbCrLf )
  WScript.Echo sAll
  Dim reCut : Set reCut = New RegExp
  reCut.Global    = True
  reCut.MultiLine = True
  reCut.Pattern   = "^\d+"
  Dim oMTS : Set oMTS = reCut.Execute( sAll )
  If 0 = oMTS.Count Then
     WScript.Echo "Bingo A!"
  Else
     ReDim aNums( oMTS.Count - 1 )
     Dim nI
     For nI = 0 To UBound( aNums )
         aNums( nI ) = oMTS( nI ).Value
     Next
     Dim sRes : sRes = "('" & Join( aNums, "','" ) & "')"    
     If sRes = sExp Then
        WScript.Echo "QED:", sRes
     Else   
        WScript.Echo "Bingo B!"
     End If
  End If

output:

111 ,   ,       ,Yes    ,Yes
222 ,   ,       ,Yes    ,Yes
333 ,   ,       ,Yes    ,Yes
444 ,   ,       ,Yes    ,Yes
555 ,   ,       ,Yes    ,Yes
666 ,   ,       ,Yes    ,Yes
QED: ('111','222','333','444','555','666')

Annotations:

I use an array to build my string to process (sAll). Your string (strText) comes from a file. So:

  Dim sAll : sAll = Join( aLines, vbCrLf )
  ==>
  Dim sAll : sAll = objFile.ReadAll

The string is parsed by an RegExp (reCut), its pattern ^\d+ looks for a sequence (+) of digits (\d) at the start (^) of a line (not the whole string; that's why the MultiLine attribute is set to True). The result of .Execute is a Match Collection (oMTS), containg Matches.

To make the the concatenation of the expected result easier, the values of the Matches are copied to an array (aNums).

The "('" & Join( aNums, "','" ) & "')" expression combines the array's elements using the separator (combinator?) ',' - to complete the result, we need just a suitable head (' resp. tail ').

Upvotes: 1

stema
stema

Reputation: 92986

Try this

(.*?)(?:\s*,){3}Yes\s*,Yes\r?

you need to take care of the linebreaks, with Regexr \r was fine. I put the line breaks into the regex because I wanted to have it optional using the ? afterwards. Otherwise the last row will not be replaced if it does not end with a line break.

and replace it with

'$1',

Here you will get a additional comma at the end. I am at the moment not sure how to handle this.

$1 is the content of the first capturing group, in your case the part before the first comma should be in it.

See it here on Regexr

Upvotes: 0

Related Questions