RoboArch
RoboArch

Reputation: 463

Getting text from HTML using BeautifulSoup

I'm trying to get the current "5 minute trend price" from the website of my electricity provider using Python2.7 and BeautifulSoup4.

The xpath is: xpath = "//html/body/div[2]/div/div/div[3]/p[1]"

or

<div class="instant prices">
  <p class="price">
    "5.2"  # this is what I'm ultimately after
    <small>¢</small>
    <strong> per kWh </strong>
  </p>

I've tried a myriad of different ways of getting the "5.2" value and have successfully been able to drill down to the "instant prices" object, but can't get anything from it.

My current code looks like this: import urllib2 from bs4 import BeautifulSoup

url = "https://rrtp.comed.com/live-prices/"

soup = BeautifulSoup(urllib2.urlopen(url).read())
#print soup

instantPrices = soup.findAll('div', 'instant prices')
print instantPrices

...and the output is:

[<div class="instant prices">
</div>]
[]

No matter what, it appears that the "instant prices" object is empty even though I can clearly see it when inspecting the element in Chrome. Any help would be hugely appreciated! Thank you!

Upvotes: 2

Views: 398

Answers (1)

elyase
elyase

Reputation: 40993

Unfortunately this data is generated via Javascript when the browser renders the website. Thats why this information is not there when you download the source with urllib. What you can do is directly query the backend:

>>> import urllib2
>>> import re

>>> url = "https://rrtp.comed.com/rrtp/ServletFeed?type=instant"
>>> s = urllib2.urlopen(url).read()
"<p class='price'>4.5<small>&cent;</small><strong> per kWh </strong></p><p>5-minute Trend Price 7:40 PM&nbsp;CT</p>\r\n"

>>> float(re.findall("\d+.\d+", s)[0])
4.5

Upvotes: 2

Related Questions