adam78
adam78

Reputation: 10068

Windows Powershell - delete a line by line number

I have a large csv file (1.6gb). how can I delete a specific line e.g. line 1005?

Upvotes: 2

Views: 3116

Answers (1)

mklement0
mklement0

Reputation: 437618

Note: The solutions below eliminate a single line from any text-based file by line number. As marsze points out, additional considerations may apply to CSV files, where care must be taken not to eliminate the header row, and rows may span multiple lines if they have values with embedded newlines; use of a CSV parser is a better choice in that case.

If performance isn't paramount, here's a memory-friendly pipeline-based way to do it:

Get-Content file.txt | 
  Where-Object ReadCount -ne 1005 |
    Set-Content -Encoding Utf8 new-file.txt

Get-Content adds a (somewhat obscurely named) .ReadCount property to each line it outputs, which contains the 1-based line number.

  • Note that the input file's character encoding isn't preserved by Get-Content, so you should control Set-Content'st output encoding explicitly, as shown above, using UTF-8 as an example.

  • Without reading the whole file into memory as a whole, you must output to a new file, at least temporarily; you can replace the original file with the temporary output file with
    Move-Item -Force new-file.txt file.txt


A faster, but memory-intensive alternative based on direct use of the .NET framework, which also allows you to update the file in place:

$file = 'file.txt'
$lines = [IO.File]::ReadAllLines("$PWD/$file")
Set-Content -Encoding UTF8 $file -Value $lines[0..1003 + 1005..($lines.Count-1)]
  • Note the need to use "$PWD/$file", i.e., to explicitly prepend the current directory path to the relative path stored in $file, because the .NET framework's idea of what the current directory is differs from PowerShell's.

    • While $lines = Get-Content $file would be functionally equivalent to $lines = [IO.File]::ReadAllLines("$PWD/$file"), it would perform noticeably poorer.
  • 0..1003 creates an array of indices from 0 to 1003; + concatenates that array with indices 1005 through the rest of the input array; note that array indices are 0-based, whereas line numbers are 1-based.

  • Also note how the resulting array is passed to Set-Content as a direct argument via -Value, which is faster than passing it via the pipeline (... | Set-Content ...), where element-by-element processing would be performed.


Finally, a memory-friendly method that is faster than the pipeline-based method:

$file = 'file.txt'
$outFile = [IO.File]::CreateText("$PWD/new-file.txt")
$lineNo = 0
try {
  foreach ($line in [IO.File]::ReadLines("$PWD/$file")) {
    if (++$lineNo -eq 1005) { continue }
    $outFile.WriteLine($line)
  }
} finally {
  $outFile.Dispose()
}

Note the use of "$PWD/..." in the .NET API calls, which ensures that a full path is passed, which is necessary, because .NET's working directory usually differs from PowerShell's.

As with the pipeline-based command, you may have to replace the original file with the new file afterwards.

Upvotes: 11

Related Questions