Getting text from HTML using BeautifulSoup

Question

I'm trying to get the current "5 minute trend price" from the website of my electricity provider using Python2.7 and BeautifulSoup4.

The xpath is: xpath = "//html/body/div[2]/div/div/div[3]/p[1]"

or


  
    "5.2"  # this is what I'm ultimately after
    ¢
     per kWh

I've tried a myriad of different ways of getting the "5.2" value and have successfully been able to drill down to the "instant prices" object, but can't get anything from it.

My current code looks like this: import urllib2 from bs4 import BeautifulSoup

url = "https://rrtp.comed.com/live-prices/"

soup = BeautifulSoup(urllib2.urlopen(url).read())
#print soup

instantPrices = soup.findAll('div', 'instant prices')
print instantPrices

...and the output is:

[
]
[]

No matter what, it appears that the "instant prices" object is empty even though I can clearly see it when inspecting the element in Chrome. Any help would be hugely appreciated! Thank you!

elyase · Accepted Answer

Unfortunately this data is generated via Javascript when the browser renders the website. Thats why this information is not there when you download the source with urllib. What you can do is directly query the backend:

>>> import urllib2
>>> import re

>>> url = "https://rrtp.comed.com/rrtp/ServletFeed?type=instant"
>>> s = urllib2.urlopen(url).read()
"4.5¢ per kWh 
5-minute Trend Price 7:40 PM CT
"

>>> float(re.findall("\d+.\d+", s)[0])
4.5

Getting text from HTML using BeautifulSoup

Answers (1)

Related Questions