Jack2345
Jack2345

Reputation: 41

Remove "NUL" characters in txt file with windows cmd

I have a large text file with millions of lines where I need to remove "NUL" characters (showing that way in Notepad++, see pic). Search and replace \0 works in Notepad++ but takes forever. I wonder how I could remove these NUL characters with a windows command that would probably work faster?

enter image description here

Upvotes: 3

Views: 3053

Answers (2)

Todd Goergen
Todd Goergen

Reputation: 1

I found that if each line has a known text string, you can use the FIND command (not findstr though) and redirect output back to a file. FIND will "eat" any null chars in the strings.

so in your data, say each line is terminated with or contains a '.' char you can use: FIND "." inputfile.txt >outputfile.txt to eliminate the nulls.

findstr wont work because it will print the string as found on input.

Upvotes: 0

Karolina Ochlik
Karolina Ochlik

Reputation: 938

I'd use the PowerShell approach instead of cmd one if you can, it'll be much quicker.

Run this in cmd:

powershell -c "(Get-Content .\file.txt) -replace '\x00+', '' | Set-Content .\file.txt"

This can be problematic with files 1GB+ as it loads the file into memory and I'd recommend using a full-blown PowerShell here.

In order to achieve it quicker you can use .NET streams within the PowerShell:

#Open file.txt
$reader = [IO.File]::OpenText("file.txt")
#Save the output to file2.txt (can't save to the same files, as it is locked by StreamReader
$writer = New-Object System.IO.StreamWriter -ArgumentList ("file2.txt")

#loop over lines in file and replace char
while ($reader.Peek() -ge 0) {
    $line = $reader.ReadLine()
    #Replace null character with empty string
    $writer.WriteLine(($line.Replace('\0', "")))
}

#Close both streams
$reader.Close()
$writer.Close()

Saving 400MB file with almost 2 million lines took ~6 seconds

Upvotes: 2

Related Questions