Blücher
Blücher

Reputation: 833

Parsing and splitting files based on the string

I have a very large file (hence .ReadLines) which I need to efficiently and quickly parse and split into other files. For each line which contains a keyword I need to copy that line and append to a specific file. This is what I have so far, the script runs but the files aren't getting populated.

$filename = "C:\dev\powershell\test1.csv"

foreach ($line in [System.IO.File]::ReadLines($filename)) {
    if    ($line | %{$_ -match "Apple"}){Out-File -Append Apples.txt}
    elseif($line | %{$_ -match "Banana"}){Out-File -Append Bananas.txt}
    elseif($line | %{$_ -match "Pear"}){Out-File -Append Pears.txt}
}

Example content of the csv file:

Apple,Test1,Cross1
Apple,Test2,Cross2
Apple,Test3,Cross3
Banana,Test4,Cross4
Pear,Test5,Cross5

I want Apples.txt to contain:

Apple,Test1,Cross1
Apple,Test2,Cross2
Apple,Test3,Cross3

Upvotes: 0

Views: 75

Answers (1)

Mathias R. Jessen
Mathias R. Jessen

Reputation: 174485

Couple of things:

Your if conditions don't need %/foreach-object - -match will do on its own:

foreach ($line in [System.IO.File]::ReadLines($filename)) {
  if($line -match "Apple"){
    # output to apple.txt
  }
  else($line -match "Banana"){
    # output to banana.txt
  }
  # etc...
}

The files aren't getting populated because you're not actually sending any output to Out-File:

foreach ($line in [System.IO.File]::ReadLines($filename)) {
  if($line -match "Apple"){
    # send $line to the file
    $line |Out-File apple.txt -Append
  }
  # etc...
}

If your files are really massive and you expect a lot of matching lines, I'd recommend using a StreamWriter for the output files - otherwise Out-File will be opening and closing the file all the time:

$OutFiles = @{
  'apple'  = New-Object System.IO.StreamWriter $PWD\apples.txt
  'banana' = New-Object System.IO.StreamWriter $PWD\bananas.txt
  'pear'   = New-Object System.IO.StreamWriter $PWD\pears.txt
}

foreach ($line in [System.IO.File]::ReadLines($filename)) {
  foreach($keyword in $OutFiles.Keys){
    if($line -match $keyword){
      $OutFiles[$keyword].WriteLine($line)
      continue
    }
  }
}

foreach($Writer in $OutFiles.Values){
  try{
    $Writer.Close()
  }
  finally{
    $Writer.Dispose()
  }
}

This way you also only have to maintain the $OutFiles hashtable if you need to update the keywords for example.

Upvotes: 2

Related Questions