bob
bob

Reputation: 79

How do I extract the latitude, longitude and location name of the website with beautiful soup and python

from bs4 import BeautifulSoup
import requests
import pandas as pd
import re
import json

source = requests.get(https://www.cellcard.com.kh/en/detail/cellcard-shops/).text.encode('utf8').decode('ascii', 'ignore')
response = BeautifulSoup(source, 'lxml')

I have no idea after that to extract the lat,long and location of the place. Should I just inspect the network and copy manually? I have a screenshot of the inspect framesource below.

(link=https://www.cellcard.com.kh/en/detail/cellcard-shops/)

(view frame source=https://www.google.com/maps/d/u/0/embed?mid=1zDbb1_TelVm0aEEXqRlxsmNTghtOc1Ug&ll=15.243727686460069%2C107.72306438411047&z=6)

Website Screenshot

Upvotes: 1

Views: 779

Answers (3)

Andrej Kesely
Andrej Kesely

Reputation: 195553

You can download .kmz data with place names and coordinates. All you need is google map id:

import io
import zipfile
import requests
import requests 
from bs4 import BeautifulSoup


url = 'https://www.cellcard.com.kh/en/detail/cellcard-shops/'

soup = BeautifulSoup( requests.get(url).content, 'html.parser' )
google_map_url = soup.select_one('h2:contains("Find Cellcard shops near you!") + h2 iframe')['src']
google_map_id = google_map_url.split('=')[-1]
open_map_data_url = 'https://www.google.com/maps/d/kml?mid=' + google_map_id   # <-- retrieve URL of .kmz file

# unpack the .kmz file and print the data
r = requests.get(open_map_data_url, stream=True)
z = zipfile.ZipFile(io.BytesIO(r.content))
soup = BeautifulSoup(z.read('doc.kml'), 'html.parser')

for name, coord in zip(soup.select('placemark name'), soup.select('placemark coordinates')):
    print('{:<40} {}'.format(name.get_text(strip=True), coord.get_text(strip=True)))

Prints:

Cellcard Shop - Svay Rieng               105.7973236,11.0828573,0
Cellcard Shop - Battambang               103.1971734,13.1016807,0
Cellcard Shop - Preah Vihea              104.97836,13.80754,0
Cellcard Shop - Oudong                   104.74631,11.82251,0
Cellcard Shop - Pailin                   102.60507,12.85738,0
Cellcard Shop - Memot                    106.182635,11.8281502,0
Cellcard Shop - Ta Khmao                 104.94545,11.48132,0
Cellcard Shop - Siem Reap                103.85765,13.35747,0
Cellcard Shop - Kampong Cham             105.46173,11.98676,0
Cellcard Shop - Banteay Mean Chey        102.9734105,13.5847979,0
Cellcard Shop- Kratie                    106.01692,12.49176,0
Cellcard Shop - Poi Pet                  102.564986,13.6515218,0
Cellcard Shop - Rattanakiri              106.98667,13.73767,0
Cellcard Shop - Saang                    105.00483,11.36499,0
Cellcard Shop - Ang Ta Saom              104.67213,11.01284,0
Cellcard Shop - Kralanh                  103.41696,13.58708,0
Cellcard Shop - Kampong Speu             104.52267,11.46298,0
Cellcard Shop - KohKong                  102.98434,11.61512,0
Cellcard Shop - Kampot                   104.1817547,10.6088833,0
Cellcard Shop - Pursat                   103.91766,12.53332,0
Cellcard Shop - Mondulkiri               107.18583,12.45962,0
Cellcard Shop - Skun                     105.07256,12.05591,0
Cellcard Shop - Oudor Mean Chey          103.50693,14.17684,0
Cellcard Shop - Stueng Treng             105.97041,13.52817,0
Cellcard Shop - Kampong Thom             105.2194808,12.90616,0
Cellcard Shop - Neak Loeung              105.28958,11.26162,0
Cellcard Shop - Bavil                    102.87552,13.25641,0
Cellcard Shop - Punley                   104.46283,12.43703,0
Cellcard Shop - Dom Dek                  104.12392,13.24428,0
Cellcard Shop - Thmor Kol                103.0893354,13.2626185,0
Cellcard Shop - Bek Chan                 104.74816,11.51111,0
Cellcard Shop - Prek Anhchanh            104.97139,11.7317,0
Cellcard Shop - Kampong Trach            104.4649104,10.5589497,0
Cellcard Shop - Moung Russey             103.448808,12.774255,0
Cellcard Shop - Stoung                   104.57211,12.93958,0
Cellcard Shop - Takeo                    104.7797767,10.9847041,0
Cellcard Shop - Kampong Chhnang          104.66808,12.25278,0
Cellcard Shop - Sdoa                     102.97318,12.89153,0
Cellcard Shop - Kampong Thmor            105.124583,12.4993313,0
Cellcard Shop - Chhouk                   104.45618,10.83606,0
Cellcard Shop - Krokor                   104.2099,12.53235,0
Cellcard Shop - Samraong Yaong           104.82445,11.26427,0
Cellcard Shop - Sihanoukville            103.52539,10.62278,0
Cellcard                                 104.925084,11.556262,0
Cellcard Shop - Kampuchea Krom           104.910777,11.5683444,0
Cellcard Shop - Toul Tom Poung           104.9154285,11.5400873,0
Cellcard Shop - Sovanna                  104.9007255,11.5451703,0
Cellcard Shop - Camko                    104.8970584,11.5912519,0

Upvotes: 1

KunduK
KunduK

Reputation: 33384

Use frame source and use regex to find the lat and long value.

import re

source = requests.get('https://www.google.com/maps/d/u/0/embed?mid=1zDbb1_TelVm0aEEXqRlxsmNTghtOc1Ug&ll=15.243727686460069%2C107.72306438411047&z=6').text
response = BeautifulSoup(source, 'lxml')
matchs=re.findall("(\d+\.\d+)",response.text)
for match in matchs:
    print(match)

Output:

102.56498599999998
14.17684
107.18583000000001
10.5589497
102.56498599999998
14.17684
107.18583000000001
10.5589497
11.0828573
105.79732360000004
11.0828573
105.79732360000003
13.1016807
103.1971734
13.1016807
103.1971734
13.80754
104.97836000000007
13.80754
104.97836000000007
11.82251
104.74631
11.82251
104.74631
12.85738
102.60507000000007
12.85738
102.60507000000007
11.8281502
106.182635
11.8281502
106.182635
11.48132
104.94544999999994
11.48132
104.94544999999994
13.35747
103.85765000000004
13.35747
103.85765000000004
11.98676
105.46172999999999
11.98676
105.46172999999999
13.5847979
102.9734105
13.5847979
102.9734105
12.49176
106.01692000000003
12.49176
106.01692000000003
13.6515218
102.56498599999998
13.6515218
102.56498599999998
13.737670000000001
106.98667
13.73767
106.98667
11.36499
105.00482999999997
11.36499
105.00482999999997
11.01284
104.67212999999992
11.01284
104.67212999999992
13.58708
103.41696000000002
13.58708
103.41696000000002
11.46298
104.52267000000006
11.46298
104.52267000000006
11.61512
102.98433999999997
11.61512
102.98433999999997
10.6088833
104.18175470000006
10.6088833
104.18175470000006
12.53332
103.91766000000007
12.53332
103.91766000000007
12.45962
107.18583000000001
12.45962
107.18583000000001
12.055910000000003
105.07256000000007
12.05591
105.07256000000007
14.17684
103.50693000000001
14.17684
103.50693000000001
13.52817
105.97041000000002
13.52817
105.97041000000002
12.90616
105.21948079999993
12.90616
105.21948079999993
11.26162
105.28958000000002
11.26162
105.28958
13.25641
102.87552000000007
13.25641
102.87552000000005
12.43703
104.46282999999994
12.43703
104.46282999999994
13.24428
104.12392
13.24428
104.12392
13.2626185
103.08933539999998
13.2626185
103.08933539999998
11.51111
104.74815999999998
11.51111
104.74815999999998
11.7317
104.97138999999993
11.7317
104.97138999999993
10.5589497
104.46491040000001
10.5589497
104.46491040000001
12.774255
103.44880799999999
12.774255
103.44880799999999
12.93958
104.57211000000007
12.93958
104.57211000000007
10.9847041
104.77977669999996
10.9847041
104.77977669999996
12.25278
104.66808000000003
12.25278
104.66808000000003
12.89153
102.97317999999996
12.89153
102.97317999999996
12.4993313
105.12458300000003
12.4993313
105.12458300000003
10.83606
104.45618000000002
10.83606
104.45618000000002
12.53235
104.20990000000008
12.53235
104.20990000000006
11.26427
104.82445000000007
11.26427
104.82445000000007
10.62278
103.52539000000002
10.62278
103.52539000000002
11.0828573
105.79732360000003
13.1016807
103.1971734
13.80754
104.97836000000007
11.82251
104.74631
12.85738
102.60507000000007
11.8281502
106.182635
11.48132
104.94544999999994
13.35747
103.85765000000004
11.98676
105.46172999999999
13.5847979
102.9734105
12.49176
106.01692000000003
13.6515218
102.56498599999998
13.73767
106.98667
11.36499
105.00482999999997
11.01284
104.67212999999992
13.58708
103.41696000000002
11.46298
104.52267000000006
11.61512
102.98433999999997
10.6088833
104.18175470000006
12.53332
103.91766000000007
12.45962
107.18583000000001
12.05591
105.07256000000007
14.17684
103.50693000000001
13.52817
105.97041000000002
12.90616
105.21948079999993
11.26162
105.28958
13.25641
102.87552000000005
12.43703
104.46282999999994
13.24428
104.12392
13.2626185
103.08933539999998
11.51111
104.74815999999998
11.7317
104.97138999999993
10.5589497
104.46491040000001
12.774255
103.44880799999999
12.93958
104.57211000000007
10.9847041
104.77977669999996
12.25278
104.66808000000003
12.89153
102.97317999999996
12.4993313
105.12458300000003
10.83606
104.45618000000002
12.53235
104.20990000000006
11.26427
104.82445000000007
10.62278
103.52539000000002
0.25
1.0
0.2980392156862745
1.0
11.556262
104.925084
11.556262
104.925084
11.5683444
104.910777
11.5683444
104.910777
11.5400873
104.9154285
11.5400873
104.9154285
11.5451703
104.9007255
11.5451703
104.9007255
11.5912519
104.89705840000002
11.5912519
104.8970584
11.556262
104.925084
11.5683444
104.910777
11.5400873
104.9154285
11.5451703
104.9007255
11.5912519
104.8970584
0.25
1.0
0.2980392156862745
1.0

Upvotes: 1

Razvan
Razvan

Reputation: 387

If the data is not exposed in the html code, I wouldn't stress with this approach. You can use Google Geocode API: DOCUMENTATION HERE

You just need to add in the API the address and you will receive more than you want (then of course, you can pick only needed data)

Upvotes: 1

Related Questions