Mikael
Mikael

Reputation: 982

Splitting file into smaller files, working script, but need some tweaks

I have a script here that looks for a delimiter in a text file with several reports in it.  The script saves each individual report as it's own text document. The tweaks I'm trying to achieve are:

In the middle of the data of each page there is - SPEC #: RX:<string>.  I want that string to be saved as the filename.

it currently saves from the delimiter down to the next one. This ignores the first report and grabs every one after. I want it to go from the delimiter UP to the next one, but I haven't figured out how to achieve that.

$InPC = "C:\Users\path"
Get-ChildItem -Path $InPC -Filter *.txt | ForEach-Object -Process {
$basename= $_.BaseName
$m = ( ( Get-Content $_.FullName | Where { $_ | Select-String "END OF 
REPORT" -Quiet } | Measure-Object | ForEach-Object { $_.Count } ) -ge 2)
$a = 1
if ($m) {
Get-Content $_.FullName | % {

If ($_ -match "END OF REPORT") {
$OutputFile = "$InPC\$basename _$a.txt"
$a++
}
Add-Content $OutputFile $_
}
Remove-Item $_.FullName
}
}

This works, as stated it outputs the file with END OF REPORT on top, the first report in the file gets omitted as it does not have END OF REPORT above it.

Edited code:

$InPC = 'C:\Path' #

ForEach($File in Get-ChildItem -Path $InPC -Filter *.txt){
    $RepNum=0
    ForEach($Report in (([IO.File]::ReadAllText('C:\Path'$File) -split 'END OF REPORT\r?\n?' -ne '')){
        if ($Report -match 'SPEC #: RX:(?<ReportFile>.*?)\.'){
            $ReportFile=$Matches.ReportFile
        }
    $OutputFile = "{0}\{1}_{2}_{3}.txt" -f  $InPC,$File.BaseName,$ReportFile,++$RepNum
    $Report | Add-Content $OutputFile
}
# Remove-Item $File.FullName
}

Upvotes: 2

Views: 66

Answers (1)

user6811411
user6811411

Reputation:

I suggest to use Regular Expressions to

  • read in the file with -raw parameter and
  • split the file at the marker END OF REPORT into sections
  • use the 'SPEC #: RX:(?<ReportFile>.*?)\.' with a named capture group to extract the string

Edit adapted to PowerShell v2

## Q:\Test\2019\09\12\SO_57911471.ps1
$InPC = 'C:\Users\path' # 'Q:\Test\2019\09\12\' # 

ForEach($File in Get-ChildItem -Path $InPC -Filter *.txt){
    $RepNum=0
    ForEach($Report in (((Get-Content $File.FullName) -join "`n") -split 'END OF REPORT\r?\n?' -ne '')){

        if ($Report -match 'SPEC #: RX:(?<ReportFile>.*?)\.'){
            $ReportFile=$Matches.ReportFile
        }
        $OutputFile = "{0}\{1}_{2}_{3}.txt" -f  $InPC,$File.BaseName,$ReportFile,++$RepNum
        $Report | Add-Content $OutputFile
    }
    # Remove-Item $File.FullName
}

This construed sample text:

## Q:\Test\2019\09\12\SO_57911471.txt
I have a script here that looks for a delimiter in a text file with several reports in it.  
In the middle of the data of each page there is - 
SPEC #: RX:string1.  
I want that string to be saved as the filename.
END OF REPORT

I have a script here that looks for a delimiter in a text file with several reports in it.  
In the middle of the data of each page there is - 
SPEC #: RX:string2.  
I want that string to be saved as the filename.
END OF REPORT

yields:

> Get-ChildItem *string* -name
SO_57911471_string1_1.txt
SO_57911471_string2_2.txt

The added ReportNum is just a precaution in case the string could not be grepped.

Upvotes: 2

Related Questions