Reputation: 41
I have a large text file with millions of lines where I need to remove "NUL" characters (showing that way in Notepad++, see pic). Search and replace \0 works in Notepad++ but takes forever. I wonder how I could remove these NUL characters with a windows command that would probably work faster?
Upvotes: 3
Views: 3053
Reputation: 1
I found that if each line has a known text string, you can use the FIND command (not findstr though) and redirect output back to a file. FIND will "eat" any null chars in the strings.
so in your data, say each line is terminated with or contains a '.' char you can use: FIND "." inputfile.txt >outputfile.txt to eliminate the nulls.
findstr wont work because it will print the string as found on input.
Upvotes: 0
Reputation: 938
I'd use the PowerShell approach instead of cmd one if you can, it'll be much quicker.
Run this in cmd:
powershell -c "(Get-Content .\file.txt) -replace '\x00+', '' | Set-Content .\file.txt"
This can be problematic with files 1GB+ as it loads the file into memory and I'd recommend using a full-blown PowerShell here.
In order to achieve it quicker you can use .NET streams within the PowerShell:
#Open file.txt
$reader = [IO.File]::OpenText("file.txt")
#Save the output to file2.txt (can't save to the same files, as it is locked by StreamReader
$writer = New-Object System.IO.StreamWriter -ArgumentList ("file2.txt")
#loop over lines in file and replace char
while ($reader.Peek() -ge 0) {
$line = $reader.ReadLine()
#Replace null character with empty string
$writer.WriteLine(($line.Replace('\0', "")))
}
#Close both streams
$reader.Close()
$writer.Close()
Saving 400MB file with almost 2 million lines took ~6 seconds
Upvotes: 2