Reputation: 67
I am trying to extract the sentences that appear between a particular pattern of word, from a file. The intention is to extract the sentences that appear between the first pair of 'GO' words from the file. The logic implemented here is to split the file based on the word 'GO', and then print the second element of the array(the sentences starting with SET in this example). However, PowerShell is not recognizing the separator (GO); instead it seems to be recognizing 'new line' as the separator, and is printing the second sentence.
Please note that I need to read the file and then get the extraction done.
Content of the file
Home address "TJ One way"
Office address "C company Two way"
GO
SET ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER ON;
SET NUMERIC_ROUNDABORT OFF;
GO
Home address "TJ One way"
Office address "C company Two way"
GO
:on error exit
GO
My code
$path = 'D:\Scripts'
$deltaFile = 'GoSampleFile.txt'
$modifiedDelta = 'GoSampleFile1.txt'
New-Item -path $path -Name $modifiedDelta -ItemType file -Force
#Split for each appearing GO, after escaping the double quotes
(Get-Content $path'\'$deltaFile).replace('"', '`"') | Set-Content $path'\'$modifiedDelta
$separator = 'GO'
$modifiedDeltaString = Get-Content $path'\'$modifiedDelta
#Write-Host $modifiedDeltaString
#Write-Host $separator
$goArray = $modifiedDeltaString -split "GO", 0, "SimpleMatch"
Write-Output $goArray[1]
#Housekeeping of the temporary file
Remove-Item $path'\'$modifiedDelta
Upvotes: 2
Views: 1126
Reputation: 1620
Might as well be a new answer as there's another problem and I'll provide more detail.
As DAX has said you need to use -Raw
as Get-Content returns an array of strings, one for each line. When you use -split
on it each element is treated separately.
Eg when used on the following array
[0] "Testing"
[1] "This is a test"
[2] "'tis still a test"
$array -split "is", 0, "SimpleMatch"
[0] "Testing"
[1] "Th"
[2] " "
[3] " a test"
[4] "'t"
[5] " still a test"
When you use the -Raw switch, Get-Content returns the entire file as a single string with newline characters.
The other thing I'll point out is you're escaping the quotes, but this isn't necessary. The reason you need to escape quotes is so PowerShell doesn't assume you're terminating the string:
$t = "This is a "bad" test"
> At line:1 char:18
+ $t = "This is a "bad" test"
+ ~~~~~~~~~~
Unexpected token 'bad" test"' in expression or statement.
You need to escape the quotes so that "bad" is still part of the string.
However when you are reading from a file the quotes are already part of the string:
Get-Content C:\test.txt
> This is a "bad" test
Because you are not typing the quotes into the console, they do not need to be escaped. To show you with your own code, check the full content of your temp file:
Home address `"TJ One way`"
Office address `"C company Two way`"
I can't think of any reason you would need to be doing this. Perhaps if you wanted to copy and paste into a console for some reason but that's it.
This may appear to work for now but only because the SQL query I assume you are trying to run doesn't contain quotes, and while I'm not sure if they are used in SQL it would throw an error if you tried, and regardless it's an extra step you don't need to be doing so you can basically scrap the whole temp file and read straight from the original.
Upvotes: 2
Reputation: 35408
Use Get-Content -Raw ...
to read the contents as one string instead of an array of strings for each line
Upvotes: 3