Reputation: 79
from bs4 import BeautifulSoup
import requests
import pandas as pd
import re
import json
source = requests.get(https://www.cellcard.com.kh/en/detail/cellcard-shops/).text.encode('utf8').decode('ascii', 'ignore')
response = BeautifulSoup(source, 'lxml')
I have no idea after that to extract the lat,long and location of the place. Should I just inspect the network and copy manually? I have a screenshot of the inspect framesource below.
(link=https://www.cellcard.com.kh/en/detail/cellcard-shops/)
(view frame source=https://www.google.com/maps/d/u/0/embed?mid=1zDbb1_TelVm0aEEXqRlxsmNTghtOc1Ug&ll=15.243727686460069%2C107.72306438411047&z=6)
Upvotes: 1
Views: 779
Reputation: 195553
You can download .kmz
data with place names and coordinates. All you need is google map id:
import io
import zipfile
import requests
import requests
from bs4 import BeautifulSoup
url = 'https://www.cellcard.com.kh/en/detail/cellcard-shops/'
soup = BeautifulSoup( requests.get(url).content, 'html.parser' )
google_map_url = soup.select_one('h2:contains("Find Cellcard shops near you!") + h2 iframe')['src']
google_map_id = google_map_url.split('=')[-1]
open_map_data_url = 'https://www.google.com/maps/d/kml?mid=' + google_map_id # <-- retrieve URL of .kmz file
# unpack the .kmz file and print the data
r = requests.get(open_map_data_url, stream=True)
z = zipfile.ZipFile(io.BytesIO(r.content))
soup = BeautifulSoup(z.read('doc.kml'), 'html.parser')
for name, coord in zip(soup.select('placemark name'), soup.select('placemark coordinates')):
print('{:<40} {}'.format(name.get_text(strip=True), coord.get_text(strip=True)))
Prints:
Cellcard Shop - Svay Rieng 105.7973236,11.0828573,0
Cellcard Shop - Battambang 103.1971734,13.1016807,0
Cellcard Shop - Preah Vihea 104.97836,13.80754,0
Cellcard Shop - Oudong 104.74631,11.82251,0
Cellcard Shop - Pailin 102.60507,12.85738,0
Cellcard Shop - Memot 106.182635,11.8281502,0
Cellcard Shop - Ta Khmao 104.94545,11.48132,0
Cellcard Shop - Siem Reap 103.85765,13.35747,0
Cellcard Shop - Kampong Cham 105.46173,11.98676,0
Cellcard Shop - Banteay Mean Chey 102.9734105,13.5847979,0
Cellcard Shop- Kratie 106.01692,12.49176,0
Cellcard Shop - Poi Pet 102.564986,13.6515218,0
Cellcard Shop - Rattanakiri 106.98667,13.73767,0
Cellcard Shop - Saang 105.00483,11.36499,0
Cellcard Shop - Ang Ta Saom 104.67213,11.01284,0
Cellcard Shop - Kralanh 103.41696,13.58708,0
Cellcard Shop - Kampong Speu 104.52267,11.46298,0
Cellcard Shop - KohKong 102.98434,11.61512,0
Cellcard Shop - Kampot 104.1817547,10.6088833,0
Cellcard Shop - Pursat 103.91766,12.53332,0
Cellcard Shop - Mondulkiri 107.18583,12.45962,0
Cellcard Shop - Skun 105.07256,12.05591,0
Cellcard Shop - Oudor Mean Chey 103.50693,14.17684,0
Cellcard Shop - Stueng Treng 105.97041,13.52817,0
Cellcard Shop - Kampong Thom 105.2194808,12.90616,0
Cellcard Shop - Neak Loeung 105.28958,11.26162,0
Cellcard Shop - Bavil 102.87552,13.25641,0
Cellcard Shop - Punley 104.46283,12.43703,0
Cellcard Shop - Dom Dek 104.12392,13.24428,0
Cellcard Shop - Thmor Kol 103.0893354,13.2626185,0
Cellcard Shop - Bek Chan 104.74816,11.51111,0
Cellcard Shop - Prek Anhchanh 104.97139,11.7317,0
Cellcard Shop - Kampong Trach 104.4649104,10.5589497,0
Cellcard Shop - Moung Russey 103.448808,12.774255,0
Cellcard Shop - Stoung 104.57211,12.93958,0
Cellcard Shop - Takeo 104.7797767,10.9847041,0
Cellcard Shop - Kampong Chhnang 104.66808,12.25278,0
Cellcard Shop - Sdoa 102.97318,12.89153,0
Cellcard Shop - Kampong Thmor 105.124583,12.4993313,0
Cellcard Shop - Chhouk 104.45618,10.83606,0
Cellcard Shop - Krokor 104.2099,12.53235,0
Cellcard Shop - Samraong Yaong 104.82445,11.26427,0
Cellcard Shop - Sihanoukville 103.52539,10.62278,0
Cellcard 104.925084,11.556262,0
Cellcard Shop - Kampuchea Krom 104.910777,11.5683444,0
Cellcard Shop - Toul Tom Poung 104.9154285,11.5400873,0
Cellcard Shop - Sovanna 104.9007255,11.5451703,0
Cellcard Shop - Camko 104.8970584,11.5912519,0
Upvotes: 1
Reputation: 33384
Use frame source and use regex to find the lat and long value.
import re
source = requests.get('https://www.google.com/maps/d/u/0/embed?mid=1zDbb1_TelVm0aEEXqRlxsmNTghtOc1Ug&ll=15.243727686460069%2C107.72306438411047&z=6').text
response = BeautifulSoup(source, 'lxml')
matchs=re.findall("(\d+\.\d+)",response.text)
for match in matchs:
print(match)
Output:
102.56498599999998
14.17684
107.18583000000001
10.5589497
102.56498599999998
14.17684
107.18583000000001
10.5589497
11.0828573
105.79732360000004
11.0828573
105.79732360000003
13.1016807
103.1971734
13.1016807
103.1971734
13.80754
104.97836000000007
13.80754
104.97836000000007
11.82251
104.74631
11.82251
104.74631
12.85738
102.60507000000007
12.85738
102.60507000000007
11.8281502
106.182635
11.8281502
106.182635
11.48132
104.94544999999994
11.48132
104.94544999999994
13.35747
103.85765000000004
13.35747
103.85765000000004
11.98676
105.46172999999999
11.98676
105.46172999999999
13.5847979
102.9734105
13.5847979
102.9734105
12.49176
106.01692000000003
12.49176
106.01692000000003
13.6515218
102.56498599999998
13.6515218
102.56498599999998
13.737670000000001
106.98667
13.73767
106.98667
11.36499
105.00482999999997
11.36499
105.00482999999997
11.01284
104.67212999999992
11.01284
104.67212999999992
13.58708
103.41696000000002
13.58708
103.41696000000002
11.46298
104.52267000000006
11.46298
104.52267000000006
11.61512
102.98433999999997
11.61512
102.98433999999997
10.6088833
104.18175470000006
10.6088833
104.18175470000006
12.53332
103.91766000000007
12.53332
103.91766000000007
12.45962
107.18583000000001
12.45962
107.18583000000001
12.055910000000003
105.07256000000007
12.05591
105.07256000000007
14.17684
103.50693000000001
14.17684
103.50693000000001
13.52817
105.97041000000002
13.52817
105.97041000000002
12.90616
105.21948079999993
12.90616
105.21948079999993
11.26162
105.28958000000002
11.26162
105.28958
13.25641
102.87552000000007
13.25641
102.87552000000005
12.43703
104.46282999999994
12.43703
104.46282999999994
13.24428
104.12392
13.24428
104.12392
13.2626185
103.08933539999998
13.2626185
103.08933539999998
11.51111
104.74815999999998
11.51111
104.74815999999998
11.7317
104.97138999999993
11.7317
104.97138999999993
10.5589497
104.46491040000001
10.5589497
104.46491040000001
12.774255
103.44880799999999
12.774255
103.44880799999999
12.93958
104.57211000000007
12.93958
104.57211000000007
10.9847041
104.77977669999996
10.9847041
104.77977669999996
12.25278
104.66808000000003
12.25278
104.66808000000003
12.89153
102.97317999999996
12.89153
102.97317999999996
12.4993313
105.12458300000003
12.4993313
105.12458300000003
10.83606
104.45618000000002
10.83606
104.45618000000002
12.53235
104.20990000000008
12.53235
104.20990000000006
11.26427
104.82445000000007
11.26427
104.82445000000007
10.62278
103.52539000000002
10.62278
103.52539000000002
11.0828573
105.79732360000003
13.1016807
103.1971734
13.80754
104.97836000000007
11.82251
104.74631
12.85738
102.60507000000007
11.8281502
106.182635
11.48132
104.94544999999994
13.35747
103.85765000000004
11.98676
105.46172999999999
13.5847979
102.9734105
12.49176
106.01692000000003
13.6515218
102.56498599999998
13.73767
106.98667
11.36499
105.00482999999997
11.01284
104.67212999999992
13.58708
103.41696000000002
11.46298
104.52267000000006
11.61512
102.98433999999997
10.6088833
104.18175470000006
12.53332
103.91766000000007
12.45962
107.18583000000001
12.05591
105.07256000000007
14.17684
103.50693000000001
13.52817
105.97041000000002
12.90616
105.21948079999993
11.26162
105.28958
13.25641
102.87552000000005
12.43703
104.46282999999994
13.24428
104.12392
13.2626185
103.08933539999998
11.51111
104.74815999999998
11.7317
104.97138999999993
10.5589497
104.46491040000001
12.774255
103.44880799999999
12.93958
104.57211000000007
10.9847041
104.77977669999996
12.25278
104.66808000000003
12.89153
102.97317999999996
12.4993313
105.12458300000003
10.83606
104.45618000000002
12.53235
104.20990000000006
11.26427
104.82445000000007
10.62278
103.52539000000002
0.25
1.0
0.2980392156862745
1.0
11.556262
104.925084
11.556262
104.925084
11.5683444
104.910777
11.5683444
104.910777
11.5400873
104.9154285
11.5400873
104.9154285
11.5451703
104.9007255
11.5451703
104.9007255
11.5912519
104.89705840000002
11.5912519
104.8970584
11.556262
104.925084
11.5683444
104.910777
11.5400873
104.9154285
11.5451703
104.9007255
11.5912519
104.8970584
0.25
1.0
0.2980392156862745
1.0
Upvotes: 1
Reputation: 387
If the data is not exposed in the html code, I wouldn't stress with this approach. You can use Google Geocode API: DOCUMENTATION HERE
You just need to add in the API the address and you will receive more than you want (then of course, you can pick only needed data)
Upvotes: 1