razielxx
razielxx

Reputation: 99

Edit column in Tab-delimited Text file using Powershell

I have a very large (~250k row and 171 Column) Tab delimited text file that I need to edit. I need to add the letter "H" to the third column on every row.

So I need it to go from 03/20/2020 09:00 03/20/2020 10:00 1269805 ...... to 03/20/2020 09:00 03/20/2020 10:00 H1269805 ..... I actually have this working with the following code:

$source = Get-ChildItem "C:\test\input\*.txt"
$target = "C:\test\test.txt"

$data = Get-Content -Path $source | ConvertFrom-Csv -Delimiter "`t" -Header Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10, Column11, Column12, Column13, Column14, Column15, Column16, Column17, Column18, Column19, Column20,
Column21, Column22, Column23, Column24, Column25, Column26, Column27, Column28, Column29, Column30, Column31, Column32, Column33, Column34, Column35, Column36, Column37, Column38, Column39, Column40,
Column41, Column42, Column43, Column44, Column45, Column46, Column47, Column48, Column49, Column50, Column51, Column52, Column53, Column54, Column55, Column56, Column57, Column58, Column59, Column60,
Column61, Column62, Column63, Column64, Column65, Column66, Column67, Column68, Column69, Column70, Column71, Column72, Column73, Column74, Column75, Column76, Column77, Column78, Column79, Column80,
Column81, Column82, Column83, Column84, Column85, Column86, Column87, Column88, Column89, Column90, Column91, Column92, Column93, Column94, Column95, Column96, Column97, Column98, Column99, Column100,
Column101, Column102, Column103, Column104, Column105, Column106, Column107, Column108, Column109, Column110, Column111, Column112, Column113, Column114, Column115, Column116, Column117, Column118, Column119, Column120, 
Column121, Column122, Column123, Column124, Column125, Column126, Column127, Column128, Column129, Column130, Column131, Column132, Column133, Column134, Column135, Column136, Column137, Column138, Column139, Column140,
Column141, Column142, Column143, Column144, Column145, Column146, Column147, Column148, Column149, Column150, Column151, Column152, Column153, Column154, Column155, Column156, Column157, Column158, Column159, Column160,
Column161, Column162, Column163, Column164, Column165, Column166, Column167, Column168, Column169, Column170, Column171

$data | % {
    If ($_.Column3) {
        #import ID
        $_.Column3 = "H$($_.Column3)"
    } }

$data | Select Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10, Column11, Column12, Column13, Column14, Column15, Column16, Column17, Column18, Column19, Column20,
Column21, Column22, Column23, Column24, Column25, Column26, Column27, Column28, Column29, Column30, Column31, Column32, Column33, Column34, Column35, Column36, Column37, Column38, Column39, Column40,
Column41, Column42, Column43, Column44, Column45, Column46, Column47, Column48, Column49, Column50, Column51, Column52, Column53, Column54, Column55, Column56, Column57, Column58, Column59, Column60,
Column61, Column62, Column63, Column64, Column65, Column66, Column67, Column68, Column69, Column70, Column71, Column72, Column73, Column74, Column75, Column76, Column77, Column78, Column79, Column80,
Column81, Column82, Column83, Column84, Column85, Column86, Column87, Column88, Column89, Column90, Column91, Column92, Column93, Column94, Column95, Column96, Column97, Column98, Column99, Column100,
Column101, Column102, Column103, Column104, Column105, Column106, Column107, Column108, Column109, Column110, Column111, Column112, Column113, Column114, Column115, Column116, Column117, Column118, Column119, Column120, 
Column121, Column122, Column123, Column124, Column125, Column126, Column127, Column128, Column129, Column130, Column131, Column132, Column133, Column134, Column135, Column136, Column137, Column138, Column139, Column140,
Column141, Column142, Column143, Column144, Column145, Column146, Column147, Column148, Column149, Column150, Column151, Column152, Column153, Column154, Column155, Column156, Column157, Column158, Column159, Column160,
Column161, Column162, Column163, Column164, Column165, Column166, Column167, Column168, Column169, Column170, Column171 | ConvertTo-Csv -Delimiter "`t" -NoTypeInformation | % { $_ -replace '"', "" } | Select-Object -Skip 1 | Set-Content -Path $target

The problem I have is it takes a long time. I understand it is a large file, but is there any other way to do this faster? I feel like the converting to and from CSV is what is taking the longest, but I may be wrong. The whole process takes roughly 25 minutes to complete. Any help would be great.

Upvotes: 1

Views: 457

Answers (1)

mklement0
mklement0

Reputation: 440197

To speed up processing, avoid the pipeline, use .NET types for file I/O and use plain-text operations:

# Create the output file.
$outFile = [IO.File]::CreateText($target)
# Loop over all input files
foreach ($file in Get-ChildItem C:\test\input\*.txt) {   
  # Loop over a given file's lines.
  foreach ($line in [IO.File]::ReadLines($file.FullName)) {
    # Prepend 'H' to the 3rd column and append to the output file.
    $outFile.WriteLine(($line -replace '^.*?\t.*?\t', '$&H'))
  }
}
$outFile.Close()

Note:

  • Be sure to always pass full file paths to .NET methods, because .NET's working directory usually differs from PowerShell's.

  • .NET file I/O methods default to BOM-less UTF-8 encoding.

  • The H is inserted in front of the 3rd tab-separated column using PowerShell's regex-based -replace operator.

Upvotes: 1

Related Questions