Pythonista
Pythonista

Reputation: 11645

scraping google web results not working

Why isn't the following working to scrape google's search results?

It's failing on trying to open the response throwing a HTTPError. I've looked at other questions and as far as I can tell I've done the encoding etc properly.

I know i haven't included catching errors etc, this is just a minified version.

def scrape_google(query):

    url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&"
    headers = {'User-Agent': 'Mozilla/5.0'}
    search = urllib.parse.urlencode({'q': " ".join(term for term in query)})
    b_search = search.encode("utf-8")
    response = urllib.request.Request(url, b_search, headers)
    page = urllib.request.urlopen(response)

Upvotes: 1

Views: 708

Answers (2)

Dmitriy Zub
Dmitriy Zub

Reputation: 1724

  • Just like people said in the comments, requests library (and (if needed) in combination with beautifulsoup) is better. I answered on question about scraping google search results here.

  • Alternatively, you can use third-party Google Organic Results API from SerpApi. It's a paid API with a free trial.

Check out the playground to test.

Code to integrate (say you want to scrape title, summary and a link):

import os
from serpapi import GoogleSearch

params = {
    "engine": "google",
    "q": "best lasagna recipe ever",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

for result in results["organic_results"]:
   print(f"Title: {result['title']}\nSummary: {result['snippet']}\nLink: {result['link']}")

Output:

Title: The BEST Lasagna Recipe Ever! | The Recipe Critic
Summary: How to Make The BEST Classic Lasagna Ever. Sauté meat then simmer with bases and seasonings: In a large skillet over medium high heat add the olive oil and onion. Cook lasagna noodles: In a large pot, bring the water to a boil. Mix cheeses together: In medium sized bowl add the ricotta cheese, parmesan, and egg.
Link: https://therecipecritic.com/lasagna-recipe/

Title: The Most Amazing Lasagna Recipe - The Stay At Home Chef
Summary: The Most Amazing Lasagna Recipe is the best recipe for homemade Italian-style lasagna. The balance ... This recipe is so good—it makes the kind of lasagna people write home about! ... Hands down absolutely the best lasagna recipe ever!
Link: https://thestayathomechef.com/amazing-lasagna-recipe/

Disclaimer, I work for SerpApi.

Upvotes: 0

dstudeba
dstudeba

Reputation: 9048

It is not working because the return of that URL is in the JSON format. If you take that URL and put in a search term such as this:

http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=bingo

You will get back the results in a JSON format which is not what beautifulsoup is set up to handle. (but it is a lot nicer than scraping)

{"responseData": 
     {"results":
   [{"GsearchResultClass":"GwebSearch","unescapedUrl":"http://www.pogo.com/games/bingo-luau","url":"http://www.pogo.com/games/bingo-

//etc

Edited to add:

Using requests:

url = ('http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=bingo')
resp = requests.get(url)
print(resp.content)

generates:

b'{"responseData": {"results":[{"GsearchResultClass":"GwebSearch","unescapedUrl":"http://www.pogo.com/games/b...
//etc    

Upvotes: 2

Related Questions