Bluz
Bluz

Reputation: 6500

Find specific sentence in a web page using powershell

I need to use powershell to resolve IP addresses via whois. My company filters port 43 and WHOIS queries so the workaround I have to use here is to ask powershell to use a website such as https://who.is, read the http stream and look for the Organisation Name matching the IP address.

So far I have managed to get the webpage read into powershell (example here with a WHOIS on yahoo.com) which is https://who.is/whois-ip/ip-address/206.190.36.45

So here is my snippet:

$url=Invoke-WebRequest https://who.is/whois-ip/ip-address/206.190.36.45

now if I do :

$url.gettype()
IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     False    HtmlWebResponseObject                    Microsoft.PowerShell.Commands.WebResponseObject

I see this object has several properties:

Name              MemberType Definition
----              ---------- ----------
Equals            Method     bool Equals(System.Object obj)
GetHashCode       Method     int GetHashCode()
GetType           Method     type GetType()
ToString          Method     string ToString()
AllElements       Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection AllElements {get;}
BaseResponse      Property   System.Net.WebResponse BaseResponse {get;set;}
Content           Property   string Content {get;}
Forms             Property   Microsoft.PowerShell.Commands.FormObjectCollection Forms {get;}
Headers           Property   System.Collections.Generic.Dictionary[string,string] Headers {get;}
Images            Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Images {get;}
InputFields       Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection InputFields {get;}
Links             Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Links {get;}
ParsedHtml        Property   mshtml.IHTMLDocument2 ParsedHtml {get;}
RawContent        Property   string RawContent {get;}
RawContentLength  Property   long RawContentLength {get;}
RawContentStream  Property   System.IO.MemoryStream RawContentStream {get;}
Scripts           Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Scripts {get;}
StatusCode        Property   int StatusCode {get;}
StatusDescription Property   string StatusDescription {get;}

but every time I try commands like

$url.ToString() | select-string "OrgName"

Powershell returns the whole HTML code because it interprets the text string as a whole. I found a workaround dumping the output into a file and then read the file through an object (so every line is an element of an array) but I have hundreds of IPs to check so that's not very optimal to create a file all the time.

I would like to know how I could read the content of the web page https://who.is/whois-ip/ip-address/206.190.36.45 and get the line that says : OrgName: Yahoo! Broadcast Services, Inc.

and just that line only.

Thanks very much for your help! :)

Upvotes: 7

Views: 40896

Answers (3)

Matt
Matt

Reputation: 46730

There are most likely better ways to parse this but you were on the right track with you current logic.

$web = Invoke-WebRequest https://who.is/whois-ip/ip-address/206.190.36.45
$web.tostring() -split "[`r`n]" | select-string "OrgName"

Select-String was returning the match as it, previously, was one long string. Using -split we can break it up to just get the return you expected.

OrgName:        Yahoo! Broadcast Services, Inc.

Some string manipulation after that will get a cleaner answer. Again, many ways to approach this as well

(($web.tostring() -split "[`r`n]" | select-string "OrgName" | Select -First 1) -split ":")[1].Trim()

I used Select -First 1 as select-string could return more than one object. It would just ensure we are working with 1 when we manipulate the string. The string is just split on a colon and trimmed to remove the spaces that are left behind.

Since you are pulling HTML data we could also walk through those properties to get more specific results. The intention of this was to get 1RedOne answer

$web = Invoke-WebRequest https://who.is/whois-ip/ip-address/206.190.36.45
$data = $web.AllElements | Where{$_.TagName -eq "Pre"} | Select-Object -Expand InnerText
$whois = ($data -split "`r`n`r`n" | select -index 1) -replace ":\s","=" | ConvertFrom-StringData
$whois.OrgName

All that data is stored in the text of the PRE tag in this example. What I do is split up the data into its sections (Sections are defined with blank lines separating them. I look for consecutive newlines). The second group of data contains the org name. Store that in a variable and pull the OrgName as a property: $whois.OrgName. Here is what $whois looks like

Name                           Value                                                                                                                         
----                           -----                                                                                                                         
Updated                        2013-04-02                                                                                                                    
City                           Sunnyvale                                                                                                                     
Address                        701 First Ave                                                                                                                 
OrgName                        Yahoo! Broadcast Services, Inc.                                                                                               
StateProv                      CA                                                                                                                            
Country                        US                                                                                                                            
Ref                            http://whois.arin.net/rest/org/YAHO                                                                                           
PostalCode                     94089                                                                                                                         
RegDate                        1999-11-17                                                                                                                    
OrgId                          YAHO

You can also make that hashtable into a custom object if you prefer dealing with those.

[pscustomobject]$whois

Updated    : 2017-01-28
City       : Sunnyvale
Address    : 701 First Ave
OrgName    : Yahoo! Broadcast Services, Inc.
StateProv  : CA
Country    : US
Ref        : https://whois.arin.net/rest/org/YAHO
PostalCode : 94089
RegDate    : 1999-11-17
OrgId      : YAHO

Upvotes: 18

FoxDeploy
FoxDeploy

Reputation: 13557

Here you go, the way to do this is in fact to do an Invoke-WebRequest. If we take a look at some of the properties of the object we get from Invoke-WebRequest, we can see that PowerShell has already parsed some of the HTML and text for us.

All that we have to do is pick out some of the values we'd like to work with. For instance, taking a peek at the ParsedText field, we see these results.

Html Text

These fields begin on about line 30 or so. In my approach to solving this problem we know that we'll find good data like this mid-way down the page, so if we could scrape the values from these lines, we'd be on our way to working with the data. The code to accomplish this first part is this:

$url = "https://who.is/whois-ip/ip-address/$ipaddress"
      $Results = Invoke-WebRequest $url 

      $ParsedResults = $Results.ParsedHtml.body.outerText.Split("`n")[30..50]

Now, PowerShell has a number of very powerful commands to import and convert data into various formats. For instance, if we could only replace the ':' colon character with an equals sign '=', we could send the whole mess over to ConverFrom-StringData and have rich PowerShell objects to work with. It turns out that we can easily do that using the universal -Replace operator, like this

$Results.ParsedHtml.body.outerText.Split("`n")[30..50] -replace ":","="

I figured you might want to do this again in the future, so I took the entire thing and made it into a simple five line function for you. Throw this into your $Profile and enjoy.

So the finished result looks like this:

Function Get-WhoIsData {
  param($ipaddress='206.190.36.45')
  $url = "https://who.is/whois-ip/ip-address/$ipaddress"
  $Results = Invoke-WebRequest $url 

  $ParsedResults = $Results.ParsedHtml.body.outerText.Split("`n")[30..50] -replace ":","=" | ConvertFrom-StringData

  $ParsedResults }

and using it works this way:

PS C:\windows\system32> Get-WhoIsData -ipaddress 206.190.36.45
   Name                           Value                                                                                                                                            
----                           -----                                                                                                                                            
NetRange                       206.190.32.0 - 206.190.63.255                                                                                                                    
CIDR                           206.190.32.0/19                                                                                                                                  
NetName                        NETBLK1-YAHOOBS                                                                                                                                  
NetHandle                      NET-206-190-32-0-1                                                                                                                               
Parent                         NET206 (NET-206-0-0-0-0)                                                                                                                         
NetType                        Direct Allocation                                                                                                                                
OriginAS                                                                                                                                                                        
Organization                   Yahoo! Broadcast Services, Inc. (YAHO)                                                                                                           
RegDate                        1995-12-15                                                                                                                                       
Updated                        2012-03-02                                                                                                                                       
Ref                            http=//whois.arin.net/rest/net/NET-206-190-32-0-1                                                                                                
OrgName                        Yahoo! Broadcast Services, Inc.                                                                                                                  
OrgId                          YAHO                                                                                                                                             
Address                        701 First Ave                                                                                                                                    
City                           Sunnyvale                                                                                                                                        
StateProv                      CA                                                                                                                                               
PostalCode                     94089     

You can then select any of the properties you'd like using normal Select-Object or Where-Object commands. For example, to pull out just the orgName property, you'd use this command:

(Get-WhoIsData).OrgName
>Yahoo! Broadcast Services, Inc.

Upvotes: 6

Soheil
Soheil

Reputation: 837

it it very simple to use whois app this is for microsoft put app in System32 or windir and in powershell use whois command then get-string get "orgname" like this

PS C:\> whois.exe -v 206.190.36.45 | Select-String "Registrant Organization"

Registrant Organization: Yahoo! Inc.

I advise you this app because has more information for your work

Upvotes: 9

Related Questions