Reputation: 59
I am trying to use PowerShell to get the source of the images of a web page (easy), but only within a certain div element. I tried:
$ie = New-Object -com InternetExplorer.Application
$ie.visible = $false
$ie.navigate('https://www.lachainemeteo.com/meteo-belgique/ville-14875/previsions-meteo-tournai-demain')
While ($ie.Busy -eq $true){Start-Sleep -seconds 1;}
Foreach($q in $ie.document.body.getElementsByClassName("quarter").GetElementsByElementName("img"))
{
Write-Output $q.src
}
But this gives an error in the ISE: Method invocation failed because [System.__ComObject] does not contain a method named 'GetElementsByElementName'. quarter is fine as div elements and I am able to get the innertext of each (5) quarter div's on the page. The difficulty is in grabbing the images within each quarter div.
Here is an image of how the HTML looks like: http://www.wimgielis.com/a.png
Can anyone point out my error please ? Thanks !
Upvotes: 1
Views: 2517
Reputation: 16106
That error is pretty specific. You can do a search specifically for it and get those details.
All that being said, walking a webpage is a very common thing using Powershell, and there are literally tons of examples on StackOverflow and other sites on the topic.
https://stackoverflow.com/search?q=%5Bpowershell%5D+%27parse+webpage%27
When you walk the page, you do have to ask specifically for the objects it contains. Also, If you are just after the source, there is no reason to open IE or any other browser. That is what the web cmdlets...
Invoke-WebRequest
#Gets content from a webpage on the Internet.
Invoke-WebRequest
#Gets content from a web page on the Internet.
... are for.
Here are the kinds of things you could leverage without a browser or COM instance to walk the page to see what is really accessible, before making further attempts to interact with it:
### How to scrape a web page with PowerShell
$w = Invoke-WebRequest -Uri 'https://www.reddit.com/r/PowerShell'
# TypeName
$w | Get-Member
<#
TypeName: Microsoft.PowerShell.Commands.HtmlWebResponseObject
Name MemberType Definition
---- ---------- ----------
Dispose Method void Dispose(), void IDisposable.Dispose()
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
AllElements Property Microsoft.PowerShell.Commands.WebCmdletElementCollection AllElements {get;}
BaseResponse Property System.Net.WebResponse BaseResponse {get;set;}
Content Property string Content {get;}
Forms Property Microsoft.PowerShell.Commands.FormObjectCollection Forms {get;}
Headers Property System.Collections.Generic.Dictionary[string,string] Headers {get;}
Images Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Images {get;}
InputFields Property Microsoft.PowerShell.Commands.WebCmdletElementCollection InputFields {get;}
Links Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Links {get;}
ParsedHtml Property mshtml.IHTMLDocument2 ParsedHtml {get;}
RawContent Property string RawContent {get;set;}
RawContentLength Property long RawContentLength {get;}
RawContentStream Property System.IO.MemoryStream RawContentStream {get;}
Scripts Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Scripts {get;}
StatusCode Property int StatusCode {get;}
StatusDescription Property string StatusDescription {get;}
MSDN ScriptMethod System.Object MSDN();
#>
$w.StatusCode
$w.AllElements
$w.AllElements.Count
$w.Links.Count
$w.Links
$w.Forms
$w.Forms[0].Fields
$w.RawContent
$w.ParsedHtml
$w = Invoke-WebRequest -Uri 'https://en.wikipedia.org/wiki/PowerShell'
$w.AllElements.Count
$w.Links.Count
$w.AllElements |
Where-Object -Property 'TagName' -EQ 'P' |
Select-Object -Property 'InnerText'
$w = Invoke-WebRequest -Uri 'https://www.reddit.com/r/aww'
$w.Links
$w = Invoke-WebRequest -Uri 'https://www.reddit.com/r/PowerShell'
$w.AllElements |
Where-Object -Property 'TagName' -EQ 'H2' |
Select-Object -Property 'InnerText'
$w = Invoke-WebRequest -Uri 'https://darksky.net/forecast/41.8756, -87.6244/us12/en'
$w.AllElements |
Where-Object Class -EQ 'summary swap' |
Select-Object -Property 'OuterText'
Also, note that some web sites will block/stop you from using automation against them, by specific design, and will just generate errors when you try.
Upvotes: 1