user3510503
user3510503

Reputation: 358

How to scrape google maps using python

I am trying to scrape the number of reviews of a place from google maps using python. For example the restaurant Pike's Landing (see google maps URL below) has 162 reviews. I want to pull this number in python.

URL: https://www.google.com/maps?cid=15423079754231040967

I am not vert well versed with HTML, but from some basic examples on the internet I wrote the following code, but what I get is a black variable after running this code. If you could let me know what am I dong wrong in this that would be much appreciated.

from urllib.request import urlopen
from bs4 import BeautifulSoup

quote_page ='https://www.google.com/maps?cid=15423079754231040967'
page = urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
price_box = soup.find_all('button',attrs={'class':'widget-pane-link'})
print(price_box.text)

Upvotes: 7

Views: 17172

Answers (3)

Dmitriy Zub
Dmitriy Zub

Reputation: 1724

You need to view the source code of the page and parse window.APP_INITIALIZATION_STATE variable block using a regular expression, there you'll find all needed data.


Alternatively, you can use Google Maps Reviews API from SerpApi.

Example JSON output:

"place_results": {
  "title": "Pike's Landing",
  "data_id": "0x51325b1733fa71bf:0xd609c9524d75cbc7",
  "reviews_link": "https://serpapi.com/search.json?engine=google_maps_reviews&hl=en&place_id=0x51325b1733fa71bf%3A0xd609c9524d75cbc7",
  "gps_coordinates": {
    "latitude": 64.8299557,
    "longitude": -147.8488774
  },
  "place_id_search": "https://serpapi.com/search.json?data=%214m5%213m4%211s0x51325b1733fa71bf%3A0xd609c9524d75cbc7%218m2%213d64.8299557%214d-147.8488774&engine=google_maps&google_domain=google.com&hl=en&type=place",
  "thumbnail": "https://lh5.googleusercontent.com/p/AF1QipNtwheOCQ97QFrUNIwKYUoAPiV81rpiW5cIiQco=w152-h86-k-no",
  "rating": 3.9,
  "reviews": 839,
  "price": "$$",
  "type": [
    "American restaurant"
  ],
  "description": "Burgers, seafood, steak & river views. Pub fare alongside steak & seafood, served in a dining room with river views & a waterfront patio.",
  "service_options": {
    "dine_in": true,
    "curbside_pickup": true,
    "delivery": false
  }
}

Code to integrate:

import os
from serpapi import GoogleSearch

params = {
    "engine": "google_maps",
    "type": "search",
    "q": "pike's landing",
    "ll": "@40.7455096,-74.0083012,14z",
    "google_domain": "google.com",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

reviews = results["place_results"]["reviews"]

print(reviews)

Output:

839

Disclaimer, I work for SerpApi.

Upvotes: 0

Vlad
Vlad

Reputation: 1

Scraping Google Maps without a browser or proxies will lead to blocking after a few successful requests. Therefore, the main problem of scraping Google is dealing with cookies and ReCaptcha.

This is a good post where you can see an example of using selenium in python for the same purpose. The general idea you start a browser and simulate what a user does on the website.

Another way will be using some reliable 3rd party service that will do all job for you and return you the results. For example, you can try Outscraper's Reviews service with a free tier.

from outscraper import ApiClient

api_client = ApiClient(api_key='SECRET_API_KEY')

# Get reviews of the specific place by id
result = api_client.google_maps_reviews('ChIJrc9T9fpYwokRdvjYRHT8nI4', reviewsLimit=20, language='en')

# Get reviews for places found by search query
result = api_client.google_maps_reviews('Memphis Seoul brooklyn usa', reviewsLimit=20, limit=500, language='en')

# Get only new reviews during last 24 hours
from datetime import datetime, timedelta
yesterday_timestamp = int((datetime.now() - timedelta(1)).timestamp())

result = api_client.google_maps_reviews(
    'ChIJrc9T9fpYwokRdvjYRHT8nI4', sort='newest', cutoff=yesterday_timestamp, reviewsLimit=100, language='en')

Disclaimer, I work for Outscraper.

Upvotes: -4

Andrej Kesely
Andrej Kesely

Reputation: 195593

It's hard to do it in pure Python and without an API, here's what I ended with (note that I added &hl=en at the end of the url, to get English results and not in my language):

import re
import requests
from ast import literal_eval

urls = [
'https://www.google.com/maps?cid=15423079754231040967&hl=en',
'https://www.google.com/maps?cid=16168151796978303235&hl=en']

for url in urls:
    for g in re.findall(r'\[\\"http.*?\d+ reviews?.*?]', requests.get(url).text):
        data = literal_eval(g.replace('null', 'None').replace('\\"', '"'))
        print(bytes(data[0], 'utf-8').decode('unicode_escape'))
        print(data[1])

Prints:

http://www.google.com/search?q=Pike's+Landing,+4438+Airport+Way,+Fairbanks,+AK+99709,+USA&ludocid=15423079754231040967#lrd=0x51325b1733fa71bf:0xd609c9524d75cbc7,1
469 reviews
http://www.google.com/search?q=Sequoia+TreeScape,+Newmarket,+ON+L3Y+8R5,+Canada&ludocid=16168151796978303235#lrd=0x882ad2157062b6c3:0xe060d065957c4103,1
42 reviews

Upvotes: 5

Related Questions