Reputation:
I'm parsing HTML from a webserver (specifically a Fanuc controller) and assigning the innerText
to a object.
#Make sure the controller respons
if ($webBody.StatusCode -eq 200) {
Write-Host "Response is Good!" -ForegroundColor DarkGreen
$preBody = $webBody.ParsedHtml.body.getElementsByTagName('PRE') | Select -ExpandProperty innerText
$preBody
}
The output looks a little like so:
[1-184 above]
[185] = 0 ''
[186] = 0 ''
[187] = 0 ''
[188] = 0 ''
[189] = 0 ''
[and so on]
I only want to read the data from 190, 191, 193 for example. What's the best way to do this? I'm struggling to sanitize the unwanted data in the object.
Currently I have a vbscript app that outputs to a txt file, cleans the data then reads it back and manipulates it in to a sql insert. I'm trying to improve on it with powershell and keen to try and keep everything within the program if possible.
Any help greatly appreciated.
Upvotes: 1
Views: 584
Reputation: 10044
With the assumption that the data set is not too large to place everything into memory. You could parse with regex into a PowerShell Object, then you can use Where-Object
to filter.
#Regex with a capture group for each important value
$RegEx = "\[(.*)\]\s=\s(\d+)\s+'(.*)'"
$IndexesToMatch = @(190, 191, 193)
$ParsedValues = $prebody.trim | ForEach-Object {
[PSCustomObject]@{
index = $_ -replace $regex,'$1'
int = $_ -replace $regex,'$2'
string = $_ -replace $regex,'$3'
}
}
$ParsedValues | Where-Object { $_.index -in $IndexesToMatch }
Input :
[190] = 1 'a'
[191] = 2 'b'
[192] = 3 'c'
[193] = 4 'd'
[194] = 5 'e'
Output :
index int string
----- --- ------
190 1 a
191 2 b
193 4 d
Upvotes: 2