Reputation: 386
I am trying to search Google for some products but the language of results Google is returning are dependent on the proxy, I have tried to fix it using 'accept-language': 'en-US,en;q=0.9'
in my headers but still no use
import requests
from bs4 import BeautifulSoup
products=["Majestic Pet Stairs Steps","Ball Jars Wide Mouth Lids 12/Pack","LED Duck Color Changing Floating Speaker"]
for product in products:
headers = {
'authority': 'www.google.com',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36',
'accept-language': 'en-US,en;q=0.9'}
url = 'https://google.com/search?q={}'.format(product)
PROXY = None
res=requests.get(url,headers=headers,proxies=PROXY)
if res.status_code!=200:
print("bad proxy")
break
soup = BeautifulSoup(res.text,"lxml")
print(soup.title.text)
what I want is to get the results in English always (regardless of proxy)
Upvotes: 4
Views: 1495
Reputation: 1724
Have you tried to place uule=location
, hl=en
or lr=lang_eng
parameters in the link of your request?
response = request.get(`https://google.com/search?q=FUS RO DAH&hl=en`)
Or using URL params
:
params = {
'q': 'FUS RO DAH',
'hl': 'en', # the language to use for the Google search
'gl': 'us' # the country to use for the Google search
'lr': 'lang_en' # one or multiple languages to limit the search to
'uule': 'w+CAIQICIGQnJhemls' #Brazil # defines encoded location you want to use for the search
}
import requests
from bs4 import BeautifulSoup
# https://www.whatismybrowser.com/detect/what-is-my-user-agent/
headers = {
'user-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36',
}
products = ["Majestic Pet Stairs Steps", "Ball Jars Wide Mouth Lids 12/Pack", "LED Duck Color Changing Floating Speaker"]
for product in products:
params = {
'q': f'{product}',
'hl': 'en',
'gl': 'us'
'lr': 'lang_en'
}
html = requests.get(f'https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'html.parser')
print(soup)
Alternatively, you can do the same thing by using Google Search Engine Results API from SerpApi. It's a paid API with a free plan of 100 searches to test out. Check out the playground.
from serpapi import GoogleSearch
params = {
"api_key": "YOUR_API_KEY",
"engine": "google",
"q": "spotlight 29 casino address",
"google_domain": "google.com.br",
"gl": "br",
"hl": "pt",
"uule": "w+CAIQICIGQnJhemls",
}
search = GoogleSearch(params)
results = search.get_dict()
# print all titles from Google organic results
for result in results["organic_results"]:
print(result["title"]
Disclaimer, I work for SerpApi.
Upvotes: 3
Reputation: 195
There is this handy library I use for my searches, a snippet from my app :
pip install google for installation, RFC
from googlesearch import search
results = list(search(str(tag)+' '+str(intitle), domains = ['stackoverflow.com'], stop = SITE.page_size))
Upvotes: 1
Reputation: 2479
They provide an API for searching: https://developers.google.com/custom-search/v1/overview
If you do much automated querying via web scraping they are likely to start putting up a captcha or blocking you.
Upvotes: 1