Ellie The Good Dog
Ellie The Good Dog

Reputation: 127

How to extract the price for the security as text from the website through Python Selenium BeautifulSoup

I am trying to simply get the price for the security shown at https://investor.vanguard.com/529-plan/profile/4514 . I run this code:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Firefox(executable_path=r'C:\Program_Files_EllieTheGoodDog\Geckodriver\geckodriver.exe')
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

When I "inspect element" the price in the selenium-opened Firefox, I clearly see this:

<span data-ng-if="!data.isLayer" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" class="ng-scope ng-binding arrange">$42.91</span >

But that data is NOT in my soup. If I print my soup, the html is really quite different from that shown on the website. I tried this, but it totally fails:

myspan = soup.find_all('span', attrs={'data-ng-if': '!data.isLayer', 'data-ng-bind-html': 'data.value', 'data-ng-class': '{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}', 'class': 'ng-scope ng-binding arrange'})

I am totally stumped. If anyone could point me in the right direction, I would really appreciate it. I sense I am totally missing something, possible several things...

Upvotes: 4

Views: 481

Answers (2)

Bitto
Bitto

Reputation: 8215

There is nothing wrong in the way you are using the data_* attributes and values to select the span. In fact it is the correct method as mentioned in the documentation.There are 4 span tags that match all the attributes. find_all will return all of those tags. The second one corresponds to the price.

What you missed out on is that the span takes some time to be loaded and the page source is returned before that. You can explicitly wait for that span and then get the page source. Here i am using Xpath to wait for the element. You can get the xpath by going to the inspect tool -> right click element -> copy -> copy xpath

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
driver = webdriver.Firefox()
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH ,'/html/body/div[1]/div[3]/div[3]/div[1]/div/div[1]/div/div/div/div[2]/div/div[3]/div[1]/div/div/table/tbody/tr[1]/td[2]/div/span[1]')))
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
myspan = soup.find_all('span', attrs={'data-ng-if': '!data.isLayer', 'data-ng-bind-html': 'data.value', 'data-ng-class': '{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}', 'class': 'ng-scope ng-binding arrange'})
print(myspan)
print(myspan[1].text)

Output

[<span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">Unit price as of 02/15/2019</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">$42.91</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">Change</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer"><span class="number-positive">$0.47</span> <span class="number-positive">1.11%</span></span>]
$42.91

Upvotes: 2

undetected Selenium
undetected Selenium

Reputation: 193088

Selenium alone can can be sufficient to extract the desired text. You need to induce WebDriverWait for the visibility_of_element_located and you can use the following solution:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
    driver.get('https://investor.vanguard.com/529-plan/profile/4514')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[@class='ng-scope']//td[@class='ng-scope right']//span[@class='ng-scope ng-binding arrange' and @data-ng-bind-html]"))).get_attribute("innerHTML"))
    
  • Console Output:

    $42.91
    

Upvotes: 1

Related Questions