Reputation: 11
I have a self generated HTML file (in a local directory) with all the body on one line:
<html><head><META http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>server - path</title></head><body><H1>server - path</H1><hr>
<pre><A HREF="/logs/folder/">[To Parent Directory]</A><br><br> jeudi 5 janvier 2017 19:38 116483 <A HREF="/folder/file1.csv">file1.csv</A><br> jeudi 5 janvier 2017 19:39 138397 <A HREF="/folder/file2.csv">file2.csv</A></A><br></pre><hr></body></html>
And I need to extract the name of the file and date.
I succeed to read the right line. But I'm blocked to split the line on <br>
.
I try something like this:
$string = "first line<br>second line <br> third line<br> end<br>"
write-host $string
$separator = "<br>"
$option = [System.StringSplitOptions]::RemoveEmptyEntries
$string.Split($separator, $option)
But I have that for result :
first line<br>second line <br> third line<br> end<br>
fi
st line
second line
thi
d line
end
I see the HTML Agility Pack, but in my case, I don't have any tag in my page.
Do you have any advice? Thanks!
Upvotes: 1
Views: 1949
Reputation: 174900
The String.Split()
method takes your string <br>
and treats it as a [char]
array, splitting on every single occurrence of either <
, b
,r
and >
.
Use the regex-based -split
operator instead:
PS C:\> $String -split $separator |Where-Object {$_}
first line
second line
third line
end
The Where-Object {$_}
pipeline element will filter out empty strings
Upvotes: 3