Reputation: 73
I know there are many similar questions, but I've been through all of those and they couldn't help me. I'm trying to get information from a website, and I've used the same method on other websites with success. Here however, it doesn't work. I would very much appreciate if somebody could give me a few tips!
I want to get the max temperature for tomorrow from this website.
import re, requests, time
from lxml import html
page = requests.get('http://www.weeronline.nl/Europa/Nederland/Amsterdam/4058223')
tree = html.fromstring(page.content)
a = tree.xpath('//*[@id="app"]/div/div[2]/div[5]/div[2]/div[2]/div[6]/div/div/div/div/div/div/ul/div[2]/div/li[1]/div/span/text()')
print(a)
This returns an empty list, however. The same method on a few other websites I checked worked fine. I've tried applying this method on other parts of this website and this domain, all to no avail.
Thanks for any and all help! Best regards
Upvotes: 0
Views: 119
Reputation: 21643
Notice that when you try to open that page you are asked whether you agree to allow cookies. (It's something like that, I have no Dutch.) You will need to use something like selenium to click on a button to 'OK' that so that you have access to the page that you really want. Then you can use the technique discussed at Web Scrape page with multiple sections to be able to get the HTML for that page, and finally apply whatever xpath it takes to retrieve the content that you want.
Upvotes: 1