How to extract the price for the security as text from the website through Python Selenium BeautifulSoup

Question

I am trying to simply get the price for the security shown at https://investor.vanguard.com/529-plan/profile/4514 . I run this code:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Firefox(executable_path=r'C:\Program_Files_EllieTheGoodDog\Geckodriver\geckodriver.exe')
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

When I "inspect element" the price in the selenium-opened Firefox, I clearly see this:

$42.91

But that data is NOT in my soup. If I print my soup, the html is really quite different from that shown on the website. I tried this, but it totally fails:

myspan = soup.find_all('span', attrs={'data-ng-if': '!data.isLayer', 'data-ng-bind-html': 'data.value', 'data-ng-class': '{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}', 'class': 'ng-scope ng-binding arrange'})

I am totally stumped. If anyone could point me in the right direction, I would really appreciate it. I sense I am totally missing something, possible several things...

Bitto · Accepted Answer

There is nothing wrong in the way you are using the data_* attributes and values to select the span. In fact it is the correct method as mentioned in the documentation.There are 4 span tags that match all the attributes. find_all will return all of those tags. The second one corresponds to the price.

What you missed out on is that the span takes some time to be loaded and the page source is returned before that. You can explicitly wait for that span and then get the page source. Here i am using Xpath to wait for the element. You can get the xpath by going to the inspect tool -> right click element -> copy -> copy xpath

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
driver = webdriver.Firefox()
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH ,'/html/body/div[1]/div[3]/div[3]/div[1]/div/div[1]/div/div/div/div[2]/div/div[3]/div[1]/div/div/table/tbody/tr[1]/td[2]/div/span[1]')))
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
myspan = soup.find_all('span', attrs={'data-ng-if': '!data.isLayer', 'data-ng-bind-html': 'data.value', 'data-ng-class': '{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}', 'class': 'ng-scope ng-binding arrange'})
print(myspan)
print(myspan[1].text)

Output

[Unit price as of 02/15/2019, $42.91, Change, $0.47 1.11%]
$42.91

How to extract the price for the security as text from the website through Python Selenium BeautifulSoup

Answers (2)

Related Questions