Reputation: 11
I am fairly new to html & JSON and am struggling a little with extracting the data I am after in a usable format within Python on a Raspberry Pi project.
I am using a device which outputs some live data over a wifi link in the format of a html page. Although the data shown on the page can be changed, I am only really concerned with getting data from a single page for now. When viewed in Notepad ++ the page looks like:
<!DOCTYPE html>
<html><head><meta http-equiv="Content-Type" content="text/html; charset=windows-1252"><style>.b{position:absolute;top:0;bottom:0;left:0;right:0;height:100%;background-color:#000;height:auto !important;}.f{border-radius: 10px;font-weight:bold;position:absolute;top:50%;left:0;right:0;margin:auto;background:#024d27;padding:50px;box-sizing:border-box;color:#FF0;margin:30px;box-shadow:0px 2px 18px -4px #0F0;transform:translateY(-50%);}#V{font-size:96px;}#U{font-size: 56px;}#N{font-size: 36px;}</style></head><body><div class="b"><div class="f"><span id="N">Voltage</span><br><span id="V">12.53</span> <span id="U">V</span><br></div></div><script>reqData();setInterval(reqData, 200);function reqData() {var xhr = new XMLHttpRequest();xhr.onload = function() {if (this.status == 200) {var data = JSON.parse(xhr.responseText);document.getElementById('N').innerHTML = data.n;document.getElementById('V').innerHTML = data.v;document.getElementById('U').innerHTML = data.u;} else {document.getElementById('N').innerHTML = "?";document.getElementById('V').innerHTML = "?";document.getElementById('U').innerHTML = "?";}};xhr.open('GET', 'readVal', true);xhr.send();}</script></body></html>
As you can see, it is a fairly simple page which just provides the information I am trying to extract, presented in a Green box with Yellow text on a black background.
From staring at the info a little, the information I am trying to extract is that associated with Span ID = 'V' (voltage), 'N' (name) and 'U' (units).
The data is displayed live on the webpage (i.e. updates every 200ms (i think) without refreshing the page) and I would like to extract the values as frequently as possible.
I have tried a few different blocks of code/methods and this seems to be the only one which I am currently able to gain any success with:
import urllib.request, json, html
data = urllib.request.urlopen("http://192.168.4.1").read()
print (data)
This returns me the html source code for the page correctly (albeit with a delay of about 5seconds which may just be related to the low spec of the Pi Zero i am running it on).
However, I dont seem able to extract the JSON data from within this. I have tried:
data_json = json.loads(data)
but this gives me a JSONDecodeError: expecting value: line 1 column 1 (char 0) which I am assuming is because the 'data' is a mix of HTML code and JSON still. I have also noticed that the actual variable information I am trying to retrieve (Voltage, 12.53 & V from the example source page at the top) are just shown as '?' placeholders when I open the page using urllib rather than loading the actual value shown on the page.
Is anyone able to offer me any pointers at all please?
Thanks in advance, Steve
Upvotes: 0
Views: 353
Reputation: 3662
As you've noticed from the error message and the raw HTML code, the result you're getting from your device isn't json data, it's html with javascript. It looks like the HTML you posted does an ajax request (a javascript GET request) to some local endpoint (/readVal
perhaps?).
Try opening http://192.168.4.1
in your browser, open dev tools, and observe what network requests the page makes under the hood - specifically, look for some XHR requests. Look at the request URL and response - I bet you'll find some local endpoint that returns the raw json data you want.
Or just try http://192.168.4.1/readVal
and see if that's it.
Upvotes: 1