Reputation: 63
I am quite new to webscraping and python. I was trying make a script that gets the Last Trade Price from http://finra-markets.morningstar.com/BondCenter/BondDetail.jsp?symbol=NFLX4333665&ticker=C647273 but some content seems to be missing when i request it with python. I have made scripts that got data from other websites successfully before, but i cant seem to get my code to work on this website.
This is my code so far:
from bs4 import BeautifulSoup
import requests
r = requests.get("http://finra-markets.morningstar.com/BondCenter/BondDetail.jsp?symbol=NFLX4333665&ticker=C647273")
c = r.content
soup = BeautifulSoup(c, "html.parser")
all = soup.find_all("div", {"class": "gr_row_a5"})
print(soup)
when i run this most of the important data is missing.
Any help would be much appreciated.
Upvotes: 6
Views: 9233
Reputation: 571
Be careful with iframe
If have observed div class="gr_row_a5"
is placed inside iframe. To Crawl data inside iframe you need to go inside that iframe and then need to take page source.
from selenium import webdriver
import selenium
import httplib
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import StaleElementReferenceException
from selenium.common.exceptions import WebDriverException
from datetime import datetime as dt
from bs4 import BeautifulSoup
browser = webdriver.Chrome()
browser.delete_all_cookies()
browser.get('http://finra-markets.morningstar.com/BondCenter/BondDetail.jsp?symbol=NFLX4333665&ticker=C647273')
iframe = browser.find_element(By.ID, 'ms-bond-detail-iframe')
browser.switch_to.frame(iframe)
c = browser.page_source
soup = BeautifulSoup(c, "html.parser")
all = soup.find_all("div", {"class": "gr_row_a5"})
print(all)
Hope this solves your problem, if not kindly let me know. Thanks
Upvotes: 5
Reputation: 571
Some web pages fill in the data using Javascript, and what appears to be the page content is not actually in the HTML that Beautiful Soup is processing. This is one of those pages.
This is confusing because if you inspect the displayed page with web developer tools in (say) Safari or Chrome, you find the HTML that has been rendered into the DOM. However, if you look at the page source, you won't find it at all.
So for this page, you can't parse out the data with Beautiful Soup. One alternative would be a site that gave you the data in a more direct way. Another might be to try the requests-html
library, which can run Javascript, and then you can scrape data from the rendered HTML. (Note: I have never tried requests-html
myself, and one should be careful about running Javascript in this way, but it's a plausible way to do it.) There are also projects where people have used Selenium or something similar as a way to get the HTML to scrape. But requests-html
looks like the most straightforward thing to try.
Upvotes: 7