nixda
nixda

Reputation: 2707

Get Google Search results via Powershell

Let's say you only have the artist and title from a music file but you don't know the album name.

When you do a Google search in Chrome for i.e Golden Earring Radar Love Album you get:

enter image description here

You see the album name (Moontan), release date (July 1973) and even the correct album cover. How is this page section called? Google Preview? Google Instant Page? I don't know

My question is

How do I programmatically get these information via PowerShell?

What I have tried

  1. Invoke-Webrequest: Not working, specific content not in response

    $Response = Invoke-WebRequest -URI "https://www.google.com/search?hl=en&q=Golden+Earring+Radar+Love+Album"
    $Response.content | Set-Content D:\test.txt
    
  2. XmlHttpRequest: Not working, specific content not in response

    $objXmlHttp = New-Object -ComObject MSXML2.ServerXMLHTTP
    $objXmlHttp.Open("GET", "https://www.google.com/search?hl=en&q=Golden+Earring+Radar+Love+Album")
    $objXmlHttp.Send()
    $objXmlHttp.responseText | Set-Content D:\test.txt
    
  3. Invoke-RestMethod: Not working, retrieves only URLs and their snippets

    $Response = Invoke-RestMethod -Uri 'https://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=Golden%20Earring%20Radar%20Love%20Album'
    $Response.responseData.results
    
  4. I looked for a Google Play or Google Music API which can be used within PowerShell

I believe the problem is, that these information are loaded via Javascript which is not executed when using methods like Invoke-WebRequest. I could be wrong here.

I see two solutions: 1) Imitate a web browser within PowerShell and load the whole website into a form. Or 2) Use fiddler to see when and how these extra information are loaded. I would prefer the second solution but both are beyond my knowledge.


Background to avoid comments saying There are other services like XYZ which better fit your needs

I already have working PowerShell scripts to get album name and additional info only by a given artist and track title for numerous services including Amazon, Deezer, Discogs, EchoNest, iTunes, Last.fm, MusicBrainz, Napster, rdio and Spotify. Because they all offer an easy to use API (except Amazon. Their implementation is pretty hard).

I ran some tests against ~3000 music files only given the artist and track title to retrieve the according album name. And when I compared the results with Google I noticed that none of the above services were so accurate as Google was.

Upvotes: 1

Views: 11787

Answers (2)

pavol.kutaj
pavol.kutaj

Reputation: 539

  • open the powershell profile ii $profile
  • paste the following snippet to the profile
Function search-google {
        $query = 'https://www.google.com/search?q='
        $args | % { $query = $query + "$_+" }
        $url = $query.Substring(0, $query.Length - 1)
        start "$url"
}

Set-Alias glg search-google
  • restart the powershell session
  • from the console just run the new command glg hello world
  • no quotes for strings needed

Upvotes: 6

Ben Randall
Ben Randall

Reputation: 1283

It's quite possible that Google returns different results depending on the user-agent making the request. So in your case you're not passing a user-agent so Google assumes that it's not a browser and is limiting the amount of information that they are returning (maybe to make your parsing a little easier).

So you have a few options, two of them are:

  1. As suggested by @AlexanderObersht, use Fiddler to sniff some of the network traffic and see what additional headers are being provided by default and fiddle around (pun-intended) with them to see if you can make it work.
    • With Invoke-RestMethod or Invoke-WebRequest you will need to add a -Headers parameter
    • With XMLHttpRequest you will have to add the headers in the appropriate properties.
  2. If you don't want to deal with the browser details you can just automate IE directly from Powershell. I've got a sample shown below.

-

$ie = New-Object -com InternetExplorer.Application -ErrorAction Stop
$ie.Visible = $true
$ie.Navigate("https://www.bing.com")
while($ie.Busy) { Start-Sleep -Milliseconds 1 }

$ie.Document.DoStuff()

Upvotes: 1

Related Questions