Reputation: 11
I'm trying to scrape results by searching "Coffee Shop" in Google and get the Shop Name, Address, etc into a DataFrame, run some analysis and export to excel.
Tried using Pandas read_html and it returned 'HTTPError: HTTP Error 403: Forbidden'. Any idea how?
Upvotes: 1
Views: 5548
Reputation: 5145
You can also use a third party service like Serp API that is a Google search engine results. It solves the issues of proxies and parsing.
It's easy to integrate with Python:
from lib.google_search_results import GoogleSearchResults
params = {
"q" : "Coffee",
"location" : "Austin, Texas, United States",
"hl" : "en",
"gl" : "us",
"google_domain" : "google.com",
"api_key" : "demo",
}
query = GoogleSearchResults(params)
dictionary_results = query.get_dictionary()
GitHub: https://github.com/serpapi/google-search-results-python
Upvotes: 1
Reputation: 2415
You got error 403 because you are blacklisted, google doesn't let you to scrape!
You can find some techniques that you can use
Manage blacklisted request with Scrapy
How to prevent getting blacklisted while scraping
Upvotes: 0
Reputation: 2747
You can use selenium webdriver like this:
from selenium import webdriver
dir = '\\'.join(os.path.dirname(__file__).split("/"))
url="www.example.com"
driver=os.path.join(dir,'chromedriver.exe')
driver.get(url)
# get the address from the html document
for elem in driver.find_elements_by_xpath('.//div[@class = "address"]'):
address= elem.text
To do this you however need to download the chromedriver. You also need to view the source code of that web page to see what is the attribute and the tag of the info you are looking for in the webpage. A comprehensive example can be found one this Example
Upvotes: 0
Reputation: 6639
First of all, scraping is discouraged because it is against their ToS.
However, if you still want to go ahead and scrape their data, there exists scraping tools for Python like:
I just assumed you are using Python. In case you are using R, you can then use:
Alternatively, you can also use their Places Search API and Places Details API.
Upvotes: 1