Reputation: 55
I have to scrape 3 elements from this website:
http://www.altitude-maps.com/city/170_562,Poznan,Wielkopolskie,Poland
I need latitude, longitude and elevation, so my code is:
import requests
from bs4 import BeautifulSoup as bs
url = 'http://www.altitude-maps.com/city/170_562,Poznan,Wielkopolskie,Poland'
r = requests.get(url)
soup = bs(r.content, features="html.parser")
latitude = soup.find('span', attrs={'id': 'curLat'}).get_text()
longitude = soup.find('span', attrs={'id': 'curLng'}).get_text()
elevation1 = soup.find('span', attrs={'id': 'altitude'}).get_text() # from the text in the center
elevation2 = soup.find('span', attrs={'id': 'curElevation'}).get_text() # from the box in the left
It finds values for the latitude and the longitude, but it doesn't for the elevation (in both cases). Instead of getting '80.33 m' and '80.33 m (263.55 ft)' I get white space and empty str.
Comparision of HTML from the BS and from the website:
BS_elevation1 = soup.find('span', attrs={'id': 'altitude'})
# BS_elevation1: <span id="altitude" style="font-size: 1.5em;"> </span>
# This part on the website: <span id="altitude" style="font-size: 1.5em;">80.33 m (263.55 ft)</span>
BS_elevation2 = soup.find('span', attrs={'id': 'curElevation'})
# BS_elevation2: <span id="curElevation" style=""></span>
# This part on the website: <span id="curElevation" style>80.33 m</span>
It seems like the text is available on the website, but it's not available in BeautifulSoup. I can't understand why it happens. How to get over it?
Upvotes: 1
Views: 129
Reputation: 11505
import httpx
import trio
import re
async def main():
async with httpx.AsyncClient(timeout=None) as client:
r = await client.get('http://www.altitude-maps.com/city/170_562,Poznan,Wielkopolskie,Poland')
goal = re.findall(r"(lati|long|elev).*?'(.+)'", r.text)
print(goal)
if __name__ == "__main__":
trio.run(main)
Output:
[('lati', '52.4063740'), ('long', '16.9251681'), ('elev', '80.329216003418')]
Upvotes: 2
Reputation: 84465
Similar regex idea but using a dictionary comprehension
import re, requests
items = ['latitude', 'longitude', 'elevation']
r = requests.get('http://www.altitude-maps.com/city/170_562,Poznan,Wielkopolskie,Poland').text
data = {item: re.search(r"(?<={}).*'(.*?)'".format(item), r).group(1) for item in items}
print(data)
Upvotes: 1
Reputation: 195418
The information you're searching for is embedded within the page inside <script>
tag, so BeautifulSoup doesn't see it. You can however use re
module to parse the page.
To get latitude, longitude and elevation you can use this example:
import re
import requests
url = "http://www.altitude-maps.com/city/170_562,Poznan,Wielkopolskie,Poland"
text = requests.get(url).text
lat = re.search(r"geoplugin_latitude.*?([\d.-]+)", text).group(1)
lon = re.search(r"geoplugin_longitude.*?([\d.-]+)", text).group(1)
elv = re.search(r"geoip_elevation.*?([\d.-]+)", text).group(1)
print("Latitude:", lat)
print("Longitude:", lon)
print("Elevation:", elv)
Prints:
Latitude: 52.4063740
Longitude: 16.9251681
Elevation: 80.329216003418
Upvotes: 1
Reputation: 54678
Because the elevation is not filled in when the page is presented. Do a "view source" in your browser and you'll see that; it's filled in by Javascript.
Do note, however, that the data you want is all present in the Javascript code in the second Javascript block. That should be pretty easy to parse.
Upvotes: 1