user1944429
user1944429

Reputation:

Wget does not fetch google search results

I noticed when running wget https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=foo and similar queries, I don't get the search results, but the google homepage.

There seems to be some redirect within the google page. Does anyone know a fix to wget so it would work?

Upvotes: 6

Views: 10596

Answers (2)

Dolda2000
Dolda2000

Reputation: 25855

#q=foo is your hint, as that's a fragment ID, which never gets sent to the server. I'm guessing you just took this URL from your browser URL-bar when using the live-search function. Since it is implemented with a lot of client-side magic, you cannot rely on it to work; try using Google with live search disabled instead. A URL pattern that seems to work looks like this: http://www.google.com/search?hl=en&q=foo.

However, I do notice that Google returns 403 Forbidden when called naïvely with wget, indicating that they don't want that. You can easily get past it by setting some other user-agent string, but do consider all the implications before doing so on a regular basis.

Upvotes: 7

anubhava
anubhava

Reputation: 785611

You can use this curl commands to pull Google query results:

curl -sA "Chrome" -L 'http://www.google.com/search?hl=en&q=time' -o search.html

For using https URL:

curl -k -sA "Chrome" -L 'https://www.google.com/search?hl=en&q=time' -o ssearch.html

-A option sets a custom user-agent Chrome in request to Google.

Upvotes: 12

Related Questions