Nick Syiek
Nick Syiek

Reputation: 55

Getting the XPath from an HTML document

https://next.newsimpact.com/NewsWidget/Live

I am trying to code a python script that will grab a value from a HTML table in the link above. The link above is the site that I am trying to grab from, and this is the code I have written. I think that possibly my XPath is incorrect, because its been doing fine on other elements, but the path I'm using is not returning/printing anything.

from lxml import html
import requests
page = requests.get('https://next.newsimpact.com/NewsWidget/Live')
tree = html.fromstring(page.content)

#This will create a list of buyers:
value = tree.xpath('//*[@id="table9521"]/tr[1]/td[4]/text()')

print('Value: ', value)

What is strange is when I open the view source code page, I cant find the table I am trying to pull from. Thank you for your help!

Upvotes: 1

Views: 290

Answers (2)

Andersson
Andersson

Reputation: 52675

Required data absent in initial page source - it comes from XHR. You can get it as below:

import requests

response = requests.get('https://next.newsimpact.com/NewsWidget/GetNextEvents?offset=-120').json()

first_previous = response['Items'][0]['Previous']  # Current output - "2.632"
second_previous = response['Items'][1]['Previous']  # Currently - "0.2"
first_forecast = response['Items'][0]['Forecast']  # ""
second_forecast = response['Items'][1]['Forecast']  # "0.3"

You can parse response as simple Python dict and get all required data

Upvotes: 1

Gilles Quénot
Gilles Quénot

Reputation: 185530

Your problem is simple, request don't handle at all. The values are JS generated !

If you really need to run this , you need to use a module capable of understanding JS, like .

You can test when you need JS or not by first using or by disabling JS in your browser. With firefox : about:config in navigation bar, then search javascript.enabled, then double click on it to switch between true or false

In , open chrome dev tools, there's the option somewhere.

Check https://github.com/makinacorpus/spynner

Another (possible) problem, use tree = html.fromstring(page.text) not tree = html.fromstring(page.content)

Upvotes: 1

Related Questions