Reputation: 339
I need to scrape the "People also ask box" from Google for questions and answers.
I make a search on google, then scrape it with BeautifulSoup.
import requests
from bs4 import BeautifulSoup
import html2text
import urllib.request
link = "https://www.google.com/search?client=firefox-b-d&source=hp&ei=v0mUXPu2ApTljwS6iLnABA&ei=lAyVXMPFCsaUsgXqmZT4DQ&q=what+is+java"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
page = requests.get(link ,headers = headers)
soup = BeautifulSoup(page.content, 'html.parser')
#For answers :
mydivs = soup.find_all('div', class_="ILfuVd NA6bn")
The results are an empty list. I checked in the html file and the answers are in fact under that class. What is wrong with my code?
Upvotes: 1
Views: 4576
Reputation: 86
Google's frontpage updates when you enter text in the search box, so you won't be able to get the results while making a simple request to the search page.
You can go to https://google.com in your browser, open the Development Tools panel (usually F12) and watch the Network tab to see the underlying requests being made to the autocomplete API.
It looks like the endpoint is https://www.google.com/complete/search?q=yourQueryHere&client=psy-ab, which is easier to query than an HTML page:
query = "what is java"
res = requests.get("https://google.com/complete/search?client=psy-ab&q=" + query)
print(res)
Also, Google probably doesn't want people to scrape this so you will probably hit rate-limiting if you do too many requests.
Upvotes: 1
Reputation: 1724
selenium
click methods or other libraries that can simulate clicks.Code and example:
from serpapi import GoogleSearch
import os
params = {
"engine": "google",
"q": "what is java",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for q_and_a in results['related_questions']:
print(f"Question: {q_and_a['question']}\nAnswer: {q_and_a['snippet']}\n")
Question: What is Java and why do I need it?
Answer: Java is a programming language and computing platform first released by Sun Microsystems in 1995. There are lots of applications and websites that will not work unless you have Java installed, and more are created every day. Java is fast, secure, and reliable.
Question: What is Java used for?
Answer: One of the most widely used programming languages, Java is used as the server-side language for most back-end development projects, including those involving big data and Android development. Java is also commonly used for desktop computing, other mobile computing, games, and numerical computing.Apr 12, 2019
Question: What is Java in simple words?
Answer: Java is a high-level programming language developed by Sun Microsystems. Instead, Java programs are interpreted by the Java Virtual Machine, or JVM, which runs on multiple platforms. ... This means all Java programs are multiplatform and can run on different platforms, including Macintosh, Windows, and Unix computers.Apr 19, 2012
Question: What is Java and its types?
Answer: The types of the Java programming language are divided into two categories: primitive types and reference types. The primitive types (§4.2) are the boolean type and the numeric types. The numeric types are the integral types byte , short , int , long , and char , and the floating-point types float and double .
Disclaimer, I work for SerpApi.
Upvotes: 1
Reputation: 11
people-also-ask might help you.
pip install people-also-ask
Usage example:
people_also_ask.get_related_questions("coffee", 5)
['How did coffee originate?',
'Is coffee good for your health?',
'Who brought coffee America?',
'Who invented coffee?',
'Why is coffee bad for you?',
'Why is drinking coffee bad for you?']
Upvotes: 1