Sebi
Sebi

Reputation: 3979

How to get googles "fast answer box" text?

I'm trying to get googles "fast answer box" text. What I mean by "fast answer box" should be clear on visiting the screenshot:enter image description here

This Box is shown by google if you enter a search and google knows the answer. So you don't need to open one of the links shown below. The box is shown if you enter following query:

https://google.de/search?q=definition%20calcium

Now I want to read this text via python script. I write a method which uses requests and beautiful soup to achieve this:

def execute(self):
    response = requests.get(url='https://google.de/search?q=definition%20calcium', proxies=self._proxy)
    soup = BeautifulSoup(response.content, 'html.parser')
    return soup.find_all("ol", class_="lr_dct_sf_sens")

The Method always returns [], which means an empty list. But if I use chrome console I can find exactly this term:

enter image description here

So I can't understand why this can't be found. For testing I write the whole content from requests.get into a file with this:

file = open('C:\\Users\\me\\Desktop\\test.txt', 'w')
file.write(response.text)
file.close()

Tried to search the file with notepad but I can't the search pattern there as well. Not sure if response.text cuts some details.

Is someone out there, who can explain this to me? How can I get this text?

Upvotes: 1

Views: 1351

Answers (3)

Dmitriy Zub
Dmitriy Zub

Reputation: 1724

In my opinion, the easiest way is to grab CSS selectors of this text by using the SelectorGadget Chrome extension in combination with select() or select_one() beautifulsoup methods.

Also, the problem could be is that you don't specify a user-agent. User-agent used to fake a real user visit, so Google (or other website) don't block a request.

from bs4 import BeautifulSoup
import requests, lxml

headers = {
    'User-agent':
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

html = requests.get('https://www.google.de/search?q=definition%20calcium', headers=headers)
soup = BeautifulSoup(html.text, 'lxml')

syllables = soup.select_one('.frCXef span').text
phonetic = soup.select_one('.g30o5d span span').text
noun = soup.select_one('.h3TRxf span').text
print(f'{syllables}\n{phonetic}\n{noun}')

# Output:
'''
cal·ci·um
ˈkalsēəm
the chemical element of atomic number 20, a soft gray metal.
'''

Alternatively, you can do the same thing using Google Direct Answer Box API from SerpApi, except you don't have to figure out how to grab certain HTML elements. It's a paid API with a free trial of 5,000 searches.

Code to integrate:

from serpapi import GoogleSearch

params = {
  "api_key": "YOUR_API_KEY",
  "engine": "google",
  "q": "definition calcium",
  "google_domain": "google.com",
}

search = GoogleSearch(params)
results = search.get_dict()

syllables = results['answer_box']['syllables']
phonetic = results['answer_box']['phonetic']
noun = results['answer_box']['definitions'][0] # specifying index since the output is an array
print(f'{syllables}\n{phonetic}\n{noun}')

# Output:
'''
cal·ci·um
ˈkalsēəm
the chemical element of atomic number 20, a soft gray metal.
'''

Disclaimer, I work for SerpApi.

Upvotes: 0

Hartator
Hartator

Reputation: 5145

SerpApi fully support dictionary results that are inside Google direct answer boxes. For example:

$ curl https://serpapi.com/search.json?q=definition%20calcium&google_domain=google.de
...
  "answer_box": {
    "type": "dictionary_results",
    "syllables": "cal·ci·um",
    "phonetic": "/ˈkalsēəm/",
    "word_type": "noun",
    "definitions": [
      "the chemical element of atomic number 20, a soft gray metal."
    ]
  },
...

Some documentation for dictionary results are here: https://serpapi.com/direct-answer-box-api

Upvotes: -1

Zroq
Zroq

Reputation: 8382

If you watch closely on your Network requests when loading that page you'll see that google fires up another link which contains your data.

Please try to access this in your browser:

https://www.google.com/search?q=definition:+calcium&bav=on.2,or.r_cp.&cad=b&fp=1&biw=1920&bih=984&dpr=1&tch=1&ech=1&psi=1489578048971.3

It'll download a file on which your fastbox data is available. You can search in that file for the chemical element of atomic number to verify this.

You'll have to clean the file and scrape the data that you want.

Upvotes: 2

Related Questions