Reputation: 17
I have a 1 GB text file and my PowerShell code is taking 5 hours to split it based on record names.
"STD|AAAA|X|dummy" "dummy" "STD|BBBB|X|dummy" "dummy" "STD|CCCC|X|dummy" "dummy" "STD|AAAA|X|dummy" "dummy"
Expected result is to create 3 text files (AAAA.txt, BBBB.txt, CCCC.txt) which also contain the matched lines.
$data = get-content "$input_path"
foreach ($line in $data) {
$matches = [regex]::Match($line, 'STD\|(?<TheFilename>[^\|`"]+)[\|`"]+')
$FirstLvl = $matches.Groups['TheFilename']
if ($FirstLvl.Value -ne "") {
$FullPath = Join-Path $ParentPath -ChildPath $FirstLvl.Value
$line | Out-File -FilePath "$FullPath" -Append
}
}
Upvotes: 1
Views: 358
Reputation: 200193
First of all, do not read the entire input file into memory. Use a pipeline instead. And split the lines at pipes for extracting the file basename rather than using a regular expression match. Also, are there actually lines that don't have a field for the basename? Otherwise checking whether or not $FirstLvl
is empty is a waste of resources.
Get-Content $input_path | ForEach-Object {
$FirstLvl = $_.Split('|')[1]
$_ | Add-Content "${ParentPath}\${FirstLvl}.txt"
}
If you require better performance than that you need to work with .Net methods.
$reader = [IO.StreamReader]$input_path
$writers = @{}
while ($reader.Peek() -ge 0) {
$line = $reader.ReadLine()
$FirstLvl = $line.Split('|')[1]
if (-not $writers.Contains($FirstLvl)) {
$writers[$FirstLvl] = [IO.StreamWriter]"${ParentPath}\${FirstLvl}.txt"
}
$writers[$FirstLvl].WriteLine($line)
}
$reader.Close()
$reader.Dispose()
foreach ($key in $writers.Keys) {
$writers[$key].Close()
$writers[$key].Dispose()
}
By storing individual writers per output file in a hashtable you avoid having to re-open output files repeatedly.
Upvotes: 2