Reputation: 1286
I was trying to scrape the snippet text from google search page and this solution worked well. The only issue I have now is that the text is in Bangla while I want it in English.
Here's what I've tried:
options = webdriver.ChromeOptions()
options.add_argument('lang=en')
driver = webdriver.Chrome(executable_path=r'the\path\for\chromedriver.exe', options=options)
I've tried adding 'lang=en'
as an argument to ChromeOptions and pass it to webdriver.Chrome()
. That's all I could figure out but it's not working.
Here's the full code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {'intl.accept_languages': 'en,en_US'})
options.add_argument('lang=en')
driver = webdriver.Chrome(executable_path=r'C:\Users\camoh\AppData\Local\Programs\Python\Python38\chromedriver.exe', options=options)
driver.get('https://google.com/')
assert "Google" in driver.title
#wait = WebDriverWait(driver, 20)
#wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".gLFyf.gsfi")))
input_field = driver.find_element_by_css_selector(".gLFyf.gsfi")
input_field.send_keys("when barack obama born")
input_field.send_keys(Keys.RETURN)
#wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".Z0LcW.XcVN5d")))
result = driver.find_element_by_css_selector(".Z0LcW.XcVN5d").text
print(result)
driver.close()
driver.quit()
Here's the page when I run the code:
Upvotes: 1
Views: 689
Reputation: 99
For scraping Google Search Answer Box no need to use selenium
you can extract it using BeautifulSoup web scraping library only.
For example, in the requests
library you can pass URL parameters such as hl
, gl
or location
for language, country of the search and location accordingly:
# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
# other parameters
"hl": "en", # language, english, https://serpapi.com/google-languages
"gl": "us", # country of the search, US -> USA, https://serpapi.com/google-countries
}
Check code in the online IDE.
from bs4 import BeautifulSoup
import requests, lxml
# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
"q": "when barack obama born", # query example
"hl": "en", # language, english, https://serpapi.com/google-languages
"gl": "us", # country of the search, US -> USA, https://serpapi.com/google-countries
}
# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"
}
html = requests.get("https://www.google.com/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, 'lxml')
title = soup.select_one(".i29hTd").text
date = soup.select_one(".t2b5Cf").text
age = soup.select_one(".kZ91ed").text
print(title, date, age)
Output:
Barack Obama/Date of birth August 4, 1961
Also if you change the parameters "hl": "en","gl": "us"
to "hl": "de","gl": "de"
you can get the output in German:
Barack Obama/Geburtsdatum 4. August 1961 Alter 61 Jahre
Also you can use Google Search Engine Results API from SerpApi. It's a paid API with a free plan. The difference is that it will bypass blocks (including CAPTCHA) from Google, no need to create the parser and maintain it.
Code example:
from serpapi import GoogleSearch
import os
params = {
"engine": "google", # SerpApi search engine
"q": "when barack obama born", # query(answer)
"api_key": "...", # serpapi key, https://serpapi.com/manage-api-key
"hl": "en", # language
"gl": "us" # country of the search, US -> USA
}
search = GoogleSearch(params) # where data extraction happens
results = search.get_dict() # JSON -> Python dictionary
answer_box = results["answer_box"]
title = answer_box.get("title")
answer = answer_box.get("answer")
print(title, answer)
Output:
Barack Obama/Date of birth August 4, 1961
Upvotes: 1
Reputation: 3400
You can try with below code to add argument with preferred language:
from selenium.webdriver.chrome.options import Options as ChromeOptions #import library
options=webdriver.ChromeOptions() #create object of ChromeOptions
options.add_argument("--lang=en")
options.add_argument("--lang=en-US")#or you can use
Upvotes: 2
Reputation: 1836
Use -
options.add_experimental_option('prefs', {'intl.accept_languages': 'en,en_US'})
Upvotes: 1