Reputation: 10068
I have a large csv file (1.6gb). how can I delete a specific line e.g. line 1005?
Upvotes: 2
Views: 3116
Reputation: 437618
Note: The solutions below eliminate a single line from any text-based file by line number. As marsze points out, additional considerations may apply to CSV files, where care must be taken not to eliminate the header row, and rows may span multiple lines if they have values with embedded newlines; use of a CSV parser is a better choice in that case.
If performance isn't paramount, here's a memory-friendly pipeline-based way to do it:
Get-Content file.txt |
Where-Object ReadCount -ne 1005 |
Set-Content -Encoding Utf8 new-file.txt
Get-Content
adds a (somewhat obscurely named) .ReadCount
property to each line it outputs, which contains the 1
-based line number.
Note that the input file's character encoding isn't preserved by Get-Content
, so you should control Set-Content
'st output encoding explicitly, as shown above, using UTF-8 as an example.
Without reading the whole file into memory as a whole, you must output to a new file, at least temporarily; you can replace the original file with the temporary output file with
Move-Item -Force new-file.txt file.txt
A faster, but memory-intensive alternative based on direct use of the .NET framework, which also allows you to update the file in place:
$file = 'file.txt'
$lines = [IO.File]::ReadAllLines("$PWD/$file")
Set-Content -Encoding UTF8 $file -Value $lines[0..1003 + 1005..($lines.Count-1)]
Note the need to use "$PWD/$file"
, i.e., to explicitly prepend the current directory path to the relative path stored in $file
, because the .NET framework's idea of what the current directory is differs from PowerShell's.
$lines = Get-Content $file
would be functionally equivalent to $lines = [IO.File]::ReadAllLines("$PWD/$file")
, it would perform noticeably poorer.0..1003
creates an array of indices from 0
to 1003
; +
concatenates that array with indices 1005
through the rest of the input array; note that array indices are 0
-based, whereas line numbers are 1
-based.
Also note how the resulting array is passed to Set-Content
as a direct argument via -Value
, which is faster than passing it via the pipeline (... | Set-Content ...
), where element-by-element processing would be performed.
Finally, a memory-friendly method that is faster than the pipeline-based method:
$file = 'file.txt'
$outFile = [IO.File]::CreateText("$PWD/new-file.txt")
$lineNo = 0
try {
foreach ($line in [IO.File]::ReadLines("$PWD/$file")) {
if (++$lineNo -eq 1005) { continue }
$outFile.WriteLine($line)
}
} finally {
$outFile.Dispose()
}
Note the use of "$PWD/..."
in the .NET API calls, which ensures that a full path is passed, which is necessary, because .NET's working directory usually differs from PowerShell's.
As with the pipeline-based command, you may have to replace the original file with the new file afterwards.
Upvotes: 11