Reputation: 147
I am trying to search in google via parameters, it's working when I search one word, but one I do space its broken I know there is a way to encode the url.
import urllib.request
from urllib.parse import urlencode, quote_plus
from fake_useragent import UserAgent
import time
import requests
from bs4 import BeautifulSoup
keyword = "host free"
url = "https://www.google.co.il/search?q=%s" % (keyword)
print(url)
thepage = urllib.request.Request(url, headers=request_headers)
page = urllib.request.urlopen(thepage)
//Continue...
Traceback:
https://www.google.co.il/search?q=host free
Traceback (most recent call last):
File "C:\Users\Maor Ben Lulu\Desktop\Maor\Python\google\Google_Bot_new.py", line 42, in <module>
page = urllib.request.urlopen(thepage)
File "C:\Program Files (x86)\Python37-32\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Program Files (x86)\Python37-32\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Program Files (x86)\Python37-32\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Program Files (x86)\Python37-32\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Program Files (x86)\Python37-32\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Program Files (x86)\Python37-32\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
[Finished in 0.7s with exit code 1]
[shell_cmd: python -u "C:\Users\Maor Ben Lulu\Desktop\Maor\Python\google\Google_Bot_new.py"]
[dir: C:\Users\Maor Ben Lulu\Desktop\Maor\Python\google]
[path: C:\Program Files (x86)\Python37-32\Scripts\;C:\Program Files (x86)\Python37-32\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;D:\Program Files\Git\cmd;C:\Users\Maor Ben Lulu\AppData\Local\Microsoft\WindowsApps;]
Also once I write in hebrew its saying :
UnicodeEncodeError: 'ascii' codec can't encode characters in position 14-18: ordinal not in range(128)
Upvotes: 1
Views: 324
Reputation: 1724
Requests
library can do it for you as Gahan mentioned.
Pass query params
and headers
via dictionary to request.get()
:
headers = {
'User-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582'
# other headers (if needed)
}
params = {
'q': 'how to create minecraft server', # query
'gl': 'us', # country to search from (United States in this case)
'hl': 'en' # language
# other params (if needed)
}
html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')
Code and example in the online IDE:
from bs4 import BeautifulSoup
import requests, lxml
headers = {
'User-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582'
}
params = {
'q': 'how to create minecraft server',
'gl': 'us',
'hl': 'en',
}
html = requests.get('https://www.google.com/search', headers=headers, params=params).text
soup = BeautifulSoup(html, 'lxml')
for result in soup.select('.tF2Cxc'):
title = result.select_one('.yuRUbf').text
link = result.select_one('.yuRUbf a')['href']
print(title, link, sep='\n')
---------
'''
How to Setup a Minecraft: Java Edition Server – Home
https://help.minecraft.net/hc/en-us/articles/360058525452-How-to-Setup-a-Minecraft-Java-Edition-Server
Minecraft Server Download
https://www.minecraft.net/en-us/download/server
Setting Up Your Own Minecraft Server - iD Tech
https://www.idtech.com/blog/creating-minecraft-server
Tutorials/Setting up a server - Minecraft Wiki
https://minecraft.fandom.com/wiki/Tutorials/Setting_up_a_server
# other results
'''
Alternatively, you can achieve the same thing by using Google Organic Results API from SerpApi. It's a paid API with a free plan.
The difference in your case is that you don't have to spend time figuring out such things or how to bypass blocks from Google if the problem is not only to pass user-agent in requests headers.
Instead, you need to iterate over structured JSON with desired parameters (params
) and get the data you want.
Example code to integrate:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "tesla",
"hl": "en",
"gl": "us",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
# scrapes first page of Google results
for result in results["organic_results"]:
print(result['title'])
print(result['link'])
---------
'''
How to Setup a Minecraft: Java Edition Server – Home
https://help.minecraft.net/hc/en-us/articles/360058525452-How-to-Setup-a-Minecraft-Java-Edition-Server
Minecraft Server Download
https://www.minecraft.net/en-us/download/server
Setting Up Your Own Minecraft Server - iD Tech
https://www.idtech.com/blog/creating-minecraft-server
Tutorials/Setting up a server - Minecraft Wiki
https://minecraft.fandom.com/wiki/Tutorials/Setting_up_a_server
# other results
'''
Disclaimer, I work for SerpApi.
Upvotes: 0
Reputation: 4213
There is a way to encode url with urllib.parse.quote but there is requests module which is very helpful in all such case and you can use it as below:
import requests
base_url = 'https://www.google.co.il/search'
res = requests.get(base_url, params={'q': 'host free'}) # query parameter and value in dict format to be passed as params kwarg
As you can see above you can pass query parameters as keyword argument
Upvotes: 1