S.ai
S.ai

Reputation: 61

Find and replace NUL character using VBScript

I have a text file which contains NUL characters in random rows. I want to find first NUL character and delete entire row from that NUL character as in below:

Input:

1 2 3 4 20170821NUL20170821NULNULNULNUL 123 NULNULNUL
1 2 3 4 20170821 20170821 6 7 10 123 10 11 13
1 2 3 4 20170821NUL20170821NULNULNULNUL 123 NULNULNUL
1 2 3 4 20170821NUL20170821NULNULNULNUL 123 NULNULNUL

Output:

1 2 3 4 20170821
1 2 3 4 20170821 20170821 6 7 10 123 10 11 13
1 2 3 4 20170821
1 2 3 4 20170821

I have the following to read text file data to a variable and loop through the data and replace NUL:

sInfile = WScript.Arguments(1)

'Create file system object
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(sInfile)
sData = oFS.ReadAll
oFS.Close
Set oFS = Nothing

MsgBox("File Read Completed")

'Remove Rest of the line from NULL
Do While InStr(sData, "\00.*") > 0
    sData = Replace(sData, "\00.*", "")
Loop

'Cleanup and end
Set oFS = Nothing
WScript.Quit

The script went passed without any errors but I can't see any changes to the data.

EDIT 1: Updated code:

Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2

If (WScript.Arguments.Count > 0) Then
    sInfile = WScript.Arguments(0)
Else
    WScript.Echo "No filename specified."
    WScript.Quit
End If
If (WScript.Arguments.Count > 1) Then
    sOutfile = WScript.Arguments(1)
Else
    sOutfile = sInfile
End If

'Get the text file from cmd file
sInfile = Wscript.Arguments(1)
' Create file system object
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(sInfile)
sData = oFS.ReadAll
oFS.Close
Set oFS = Nothing

' Remove Rest of the line from NULL
Set re = New RegExp
re.Pattern = Chr(0) & ".*"
re.Global  = True
sData = re.Replace(sData, "")

Set oOutfile = oFSO.OpenTextFile(sOutfile, ForWriting, True)
oOutfile.Write(sData)
oOutfile.Close
Set oOutfile = Nothing

' Cleanup and end
Set oFS = Nothing
WScript.Quit

Here is the sample input I am giving:

enter image description here

I would like to see the output as below:

enter image description here

But I got the below output:

੊ਊਊਊਊਊਊਊਊਊਊ

EDIT 2: I am not aware of hex editors. Here is the sample input of HextDump:

FF FE 4A 00 42 00 43 00 09 00 31 00 32 00 33 00 34 00 38 00 36 00 37 00 38 
00 09 00 38 00 37 00 09 00 30 00 09 00 30 00 09 00 31 00 32 00 33 00 09 00 
32 00 30 00 31 00 37 00 09 00 31 00 32 00 33 00 34 00 09 00 31 00 33 00 34 
00 32 00 30 00 09 00 32 00 30 00 31 00 37 00 30 00 38 00 30 00 39 00 09 00 
35 00 31 00 30 00 33 00 09 00 09 00 09 00 09 00 33 00 34 00 31 00 34 00 38 
00 38 00 09 00 32 00 09 00 32 00 30 00 31 00 37 00 09 00 38 00 09 00 31 00 
09 00 37 00 09 00 2D 00 32 00 36 00 34 00 30 00 09 00 2D 00 33 00 39 00 33 
00 2E 00 31 00 36 00 31 00 33 00 37 00 35 00 09 00 2D 00 33 00 33 00 32 00 
2E 00 34 00 36 00 38 00 35 00 37 00 39 00 09 00 41 00 30 00 31 00 31 00 32 
00 35 00 38 00 39 00 2F 00 33 00 34 00 31 00 34 00 38 00 38 00 2F 00 09 00 
09 00 09 00 09 00 09 00 09 00 09 00 09 00 32 00 09 00 09 00 09 00 32 00 31 
00 37 00 38 00 31 00 09 00 58 00 59 00 5A 00 09 00 58 00 59 00 5A 00 09 00 
58 00 59 00 5A 00 09 00 31 00 32 00 33 00 09 00 31 00 32 00 33 00 09 00 2D 
00 32 00 36 00 34 00 09 00 58 00 59 00 5A 00 09 00 31 00 09 00 31 00 09 00 
31 00 32 00 33 00 09 00 09 00 09 00 32 00 31 00 37 00 38 00 32 00 31 00 0D 
00 0A 00 41 00 42 00 43 00 09 00 31 00 32 00 33 00 34 00 38 00 36 00 37 00

and the HexDump of output which I got FF FE 4A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A

Upvotes: 2

Views: 4143

Answers (2)

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200373

The Replace function doesn't do regular expression replacements, and VBScript also doesn't recognize \0 as the character NUL. For the former you need the Replace method of a regular expression object, for the latter you need the Chr function. Also, you don't need a loop, since you read the content of the file as a single string anyway.

However, your file is apparently UTF-16 LE encoded, which means that each character is represented by 2 bytes, one of which is zero for ANSI characters. If you read such files as ANSI files your replacement would remove everything after the first byte. You need to set the 4th parameter of the OpenTextFile method to -1 in order to handle the file as a UTF-16 (vulgo Unicode) file.

Change this:

Set oFS = oFSO.OpenTextFile(sInfile)
sData = oFS.ReadAll
oFS.Close
Set oFS = Nothing
...
Do While InStr(sData, "\00.*") > 0
    sData = Replace(sData, "\00.*", "")
Loop
...
Set oOutfile = oFSO.OpenTextFile(sOutfile, ForWriting, True)
oOutfile.Write(sData)
oOutfile.Close
Set oOutfile = Nothing

into this:

sData = oFSO.OpenTextFile(sInfile, 1, False, -1).ReadAll

Set re = New RegExp
re.Pattern = Chr(0) & "[^\r\n]*"
re.Global  = True
sData = re.Replace(sData, "")

oFSO.OpenTextFile(sOutfile, 2, True, -1).Write sData

and the problem will disappear.

The pattern [^\r\n]* (any number of characters that are neither carriage-return nor line-feed) is used to keep Windows line breaks intact. Those consist of the two characters carriage-return and line-feed (CR-LF). The regular expression meta-character . does not match line-feeds, but it does match carriage-return, so those would be removed when using the pattern .*.


For clarity: the above code will remove a NUL character and the remainder of the line from each line containing a NUL character. Lines not containing NUL characters will not be affected.

If you want the entire text after a NUL character removed (including subsequent lines) you could do it like this:

Set re = New RegExp
re.Pattern = Chr(0) & "[\s\S]*"
sData = re.Replace(sData, "")

Upvotes: 1

omegastripes
omegastripes

Reputation: 12612

You are trying to specify regex pattern for Replace() function, that won't work. Generally, you don't need to use regex at all.

Here is non-regex code:

With CreateObject("Scripting.FileSystemObject").OpenTextFile(WScript.Arguments(1), 1, False, 0)
    sData = ""
    If Not .AtEndOfStream Then sData = .ReadAll
    .Close
End With

a = Split(sData, vbCrLf)
For i = 0 To UBound(a)
    q = Instr(a(i), Chr(0))
    If q > 0 Then a(i) = Mid(a(i), 1, q - 1)
Next
sData = Join(a, vbCrLf)

And here is regex version:

With CreateObject("Scripting.FileSystemObject").OpenTextFile(WScript.Arguments(1), 1, False, 0)
    sData = ""
    If Not .AtEndOfStream Then sData = .ReadAll
    .Close
End With

With CreateObject("VBScript.RegExp")
    .Pattern = "^(.*?)\x00.*$"
    .Global  = True
    .Multiline  = True
    sData = .Replace(sData, "$1")
End With

Upvotes: 1

Related Questions