BertB
BertB

Reputation: 118

remove nul characters from text file using vbs

I have text files that are approximately 6MB in size. There are some lines that contain the NULL (Chr(0))character that I would like to remove. I have two methods to do this: using Asc()=0 but this takes approximately 50s to complete, the other method uses InStr (line, Chr(0)) =0 (fast ~ 4sec)but the results remove vital info from the lines which contain the NULL characters.

First line of text file as example:

@@MMCIBN.000NULL7NULL076059NULL7653NULL1375686349NULL2528NULL780608NULL10700NULL\NULL_NC_ACT.DIR\CFG_RESET.INI

First method (works but VERY slow)

function normalise (textFile )

Set fso = CreateObject("Scripting.FileSystemObject")
writeTo = fso.BuildPath(tempFolder, saveTo & ("\Output.arc"))
Set objOutFile = fso.CreateTextFile(writeTo)
Set objFile = fso.OpenTextFile(textFile,1)

Do Until objFile.AtEndOfStream 
    strCharacters = objFile.Read(1)
    If Asc(strCharacters) = 0 Then
        objOutFile.Write ""
        nul = true
    Else
        if nul = true then
            objOutFile.Write(VbLf & strCharacters)
        else
            objOutFile.Write(strCharacters)
        end if
    nul = false
    End If
Loop

objOutFile.close
end function

The output looks like this:

@@MMCIBN.000
7
076059
7653
1375686349
2528
780608
10700
\
_NC_ACT.DIR\CFG_RESET.INI

Second method code:

filename = WScript.Arguments(0)

Set fso = CreateObject("Scripting.FileSystemObject")

sDate = Year(Now()) & Right("0" & Month(now()), 2) & Right("00" & Day(Now()), 2)
file = fso.BuildPath(fso.GetFile(filename).ParentFolder.Path, saveTo & "Output " & sDate & ".arc")
Set objOutFile = fso.CreateTextFile(file)
Set f = fso.OpenTextFile(filename)

Do Until f.AtEndOfStream
    line = f.ReadLine

    If (InStr(line, Chr(0)) > 0) Then 
        line = Left(line, InStr(line, Chr(0)) - 1) & Right(line, InStr(line, Chr(0)) + 1)
    end if

    objOutFile.WriteLine line

Loop

f.Close

but then the output is:

@@MMCIBN.000\CFG_RESET.INI

Can someone please guide me how to remove the NULLS quickly without losing information. I have thought to try and use the second method to scan for which line numbers need updating and then feed this to the first method to try and speed things up, but quite honestly I have no idea where to even start doing this! Thanks in advance...

Upvotes: 4

Views: 4502

Answers (2)

Jean-Marc
Jean-Marc

Reputation: 109

I tried this method (update2) for reading a MS-Access lock file (Null characters terminated strings in 64 byte records) and the ADODB.Stream didn't want to open an already in use file. So I changed that part to :

    Set fso = CreateObject("Scripting.FileSystemObject")
    Set f = fso.GetFile(Lfile)
    z = f.Size
    set ts = f.OpenAsTextStream(ForReading, 0) 'TristateFalse   
    strLog = ts.Read(z)
    ts.Close
    set f = nothing
    ' replace 00 with spaces
    With New RegExp
        .Pattern = "\x00+"
        .Global = True
        strLog = .Replace(strLog, " ")
    End With
    ' read MS-Access computername and username
    for r = 1 to len(strLog) step 64
        fnd = trim(mid(strLog,r, 32)) & ", " &  trim(mid(strLog,r+32, 32)) & vbCrLf
        strRpt = strRpt & fnd
    next

Upvotes: 1

Bond
Bond

Reputation: 16311

It looks like the first method is just replacing each NULL with a newline. If that's all you need, you can just do this:

Updated:

OK, sounds like you need to replace each set of NULLs with a newline. Let's try this instead:

strText = fso.OpenTextFile(textFile, 1).ReadAll()

With New RegExp
    .Pattern = "\x00+"
    .Global = True
    strText = .Replace(strText, vbCrLf)
End With

objOutFile.Write strText

Update 2:

I think the Read/ReadAll methods of the TextStream class are having trouble dealing with the mix of text and binary data. Let's use an ADO Stream object to read the data instead.

' Read the "text" file using a Stream object...
Const adTypeText = 2

With CreateObject("ADODB.Stream")
    .Type = adTypeText
    .Open
    .LoadFromFile textFile
    .Charset = "us-ascii"
    strText = .ReadText()
End With

' Now do our regex replacement...
With New RegExp
    .Pattern = "\x00+"
    .Global = True
    strText = .Replace(strText, vbCrLf)
End With

' Now write using a standard TextStream...
With fso.CreateTextFile(file)
    .Write strText
    .Close
End With

Upvotes: 4

Related Questions