Reputation: 61
I have a text file which contains NUL characters in random rows. I want to find first NUL character and delete entire row from that NUL character as in below:
Input:
1 2 3 4 20170821NUL20170821NULNULNULNUL 123 NULNULNUL
1 2 3 4 20170821 20170821 6 7 10 123 10 11 13
1 2 3 4 20170821NUL20170821NULNULNULNUL 123 NULNULNUL
1 2 3 4 20170821NUL20170821NULNULNULNUL 123 NULNULNUL
Output:
1 2 3 4 20170821
1 2 3 4 20170821 20170821 6 7 10 123 10 11 13
1 2 3 4 20170821
1 2 3 4 20170821
I have the following to read text file data to a variable and loop through the data and replace NUL:
sInfile = WScript.Arguments(1)
'Create file system object
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(sInfile)
sData = oFS.ReadAll
oFS.Close
Set oFS = Nothing
MsgBox("File Read Completed")
'Remove Rest of the line from NULL
Do While InStr(sData, "\00.*") > 0
sData = Replace(sData, "\00.*", "")
Loop
'Cleanup and end
Set oFS = Nothing
WScript.Quit
The script went passed without any errors but I can't see any changes to the data.
EDIT 1: Updated code:
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2
If (WScript.Arguments.Count > 0) Then
sInfile = WScript.Arguments(0)
Else
WScript.Echo "No filename specified."
WScript.Quit
End If
If (WScript.Arguments.Count > 1) Then
sOutfile = WScript.Arguments(1)
Else
sOutfile = sInfile
End If
'Get the text file from cmd file
sInfile = Wscript.Arguments(1)
' Create file system object
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(sInfile)
sData = oFS.ReadAll
oFS.Close
Set oFS = Nothing
' Remove Rest of the line from NULL
Set re = New RegExp
re.Pattern = Chr(0) & ".*"
re.Global = True
sData = re.Replace(sData, "")
Set oOutfile = oFSO.OpenTextFile(sOutfile, ForWriting, True)
oOutfile.Write(sData)
oOutfile.Close
Set oOutfile = Nothing
' Cleanup and end
Set oFS = Nothing
WScript.Quit
Here is the sample input I am giving:
I would like to see the output as below:
But I got the below output:
ਊਊਊਊਊਊਊਊਊਊ
EDIT 2: I am not aware of hex editors. Here is the sample input of HextDump:
FF FE 4A 00 42 00 43 00 09 00 31 00 32 00 33 00 34 00 38 00 36 00 37 00 38
00 09 00 38 00 37 00 09 00 30 00 09 00 30 00 09 00 31 00 32 00 33 00 09 00
32 00 30 00 31 00 37 00 09 00 31 00 32 00 33 00 34 00 09 00 31 00 33 00 34
00 32 00 30 00 09 00 32 00 30 00 31 00 37 00 30 00 38 00 30 00 39 00 09 00
35 00 31 00 30 00 33 00 09 00 09 00 09 00 09 00 33 00 34 00 31 00 34 00 38
00 38 00 09 00 32 00 09 00 32 00 30 00 31 00 37 00 09 00 38 00 09 00 31 00
09 00 37 00 09 00 2D 00 32 00 36 00 34 00 30 00 09 00 2D 00 33 00 39 00 33
00 2E 00 31 00 36 00 31 00 33 00 37 00 35 00 09 00 2D 00 33 00 33 00 32 00
2E 00 34 00 36 00 38 00 35 00 37 00 39 00 09 00 41 00 30 00 31 00 31 00 32
00 35 00 38 00 39 00 2F 00 33 00 34 00 31 00 34 00 38 00 38 00 2F 00 09 00
09 00 09 00 09 00 09 00 09 00 09 00 09 00 32 00 09 00 09 00 09 00 32 00 31
00 37 00 38 00 31 00 09 00 58 00 59 00 5A 00 09 00 58 00 59 00 5A 00 09 00
58 00 59 00 5A 00 09 00 31 00 32 00 33 00 09 00 31 00 32 00 33 00 09 00 2D
00 32 00 36 00 34 00 09 00 58 00 59 00 5A 00 09 00 31 00 09 00 31 00 09 00
31 00 32 00 33 00 09 00 09 00 09 00 32 00 31 00 37 00 38 00 32 00 31 00 0D
00 0A 00 41 00 42 00 43 00 09 00 31 00 32 00 33 00 34 00 38 00 36 00 37 00
and the HexDump of output which I got FF FE 4A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A
Upvotes: 2
Views: 4143
Reputation: 200373
The Replace
function doesn't do regular expression replacements, and VBScript also doesn't recognize \0
as the character NUL. For the former you need the Replace
method of a regular expression object, for the latter you need the Chr
function. Also, you don't need a loop, since you read the content of the file as a single string anyway.
However, your file is apparently UTF-16 LE encoded, which means that each character is represented by 2 bytes, one of which is zero for ANSI characters. If you read such files as ANSI files your replacement would remove everything after the first byte. You need to set the 4th parameter of the OpenTextFile
method to -1 in order to handle the file as a UTF-16 (vulgo Unicode) file.
Change this:
Set oFS = oFSO.OpenTextFile(sInfile)
sData = oFS.ReadAll
oFS.Close
Set oFS = Nothing
...
Do While InStr(sData, "\00.*") > 0
sData = Replace(sData, "\00.*", "")
Loop
...
Set oOutfile = oFSO.OpenTextFile(sOutfile, ForWriting, True)
oOutfile.Write(sData)
oOutfile.Close
Set oOutfile = Nothing
into this:
sData = oFSO.OpenTextFile(sInfile, 1, False, -1).ReadAll
Set re = New RegExp
re.Pattern = Chr(0) & "[^\r\n]*"
re.Global = True
sData = re.Replace(sData, "")
oFSO.OpenTextFile(sOutfile, 2, True, -1).Write sData
and the problem will disappear.
The pattern [^\r\n]*
(any number of characters that are neither carriage-return nor line-feed) is used to keep Windows line breaks intact. Those consist of the two characters carriage-return and line-feed (CR-LF). The regular expression meta-character .
does not match line-feeds, but it does match carriage-return, so those would be removed when using the pattern .*
.
For clarity: the above code will remove a NUL character and the remainder of the line from each line containing a NUL character. Lines not containing NUL characters will not be affected.
If you want the entire text after a NUL character removed (including subsequent lines) you could do it like this:
Set re = New RegExp
re.Pattern = Chr(0) & "[\s\S]*"
sData = re.Replace(sData, "")
Upvotes: 1
Reputation: 12612
You are trying to specify regex pattern for Replace()
function, that won't work. Generally, you don't need to use regex at all.
Here is non-regex code:
With CreateObject("Scripting.FileSystemObject").OpenTextFile(WScript.Arguments(1), 1, False, 0)
sData = ""
If Not .AtEndOfStream Then sData = .ReadAll
.Close
End With
a = Split(sData, vbCrLf)
For i = 0 To UBound(a)
q = Instr(a(i), Chr(0))
If q > 0 Then a(i) = Mid(a(i), 1, q - 1)
Next
sData = Join(a, vbCrLf)
And here is regex version:
With CreateObject("Scripting.FileSystemObject").OpenTextFile(WScript.Arguments(1), 1, False, 0)
sData = ""
If Not .AtEndOfStream Then sData = .ReadAll
.Close
End With
With CreateObject("VBScript.RegExp")
.Pattern = "^(.*?)\x00.*$"
.Global = True
.Multiline = True
sData = .Replace(sData, "$1")
End With
Upvotes: 1