Reputation: 2062
I am writing a small program to fetch stock exchange data using Python. The sample code below makes a request to a URL and it should return the appropriate data. Here is the resource that I am using: https://python.plainenglish.io/4-python-libraries-to-help-you-make-money-from-webscraping-57ba6d8ce56d
from xml.dom.minidom import Element
from selenium import webdriver
from bs4 import BeautifulSoup
import logging
from selenium.webdriver.common.by import By
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
url = "http://eoddata.com/stocklist/NASDAQ/A.htm"
driver = webdriver.Chrome(executable_path="C:\Program Files\Chrome\chromedriver")
page = driver.get(url)
# TODO: find element by CSS selector
stock_symbol = driver.find_elements(by=By.CSS_SELECTOR, value='#ctl00_cph1_divSymbols')
soup = BeautifulSoup(driver.page_source, features="html.parser")
elements = []
table = soup.find('div', {'id','ct100_cph1_divSymbols'})
logging.info(f"{table}")
I've added a todo for getting the element that I am trying to retrieve from the program.
Expected: The proper data should be returned.
Actual: Nothing is returned.
Upvotes: 0
Views: 787
Reputation: 25196
It is most common practice to scrape tables with pandas.read_html()
to get its texts, so I would also recommend it.
But to answer your question and follow your approach, select <div>
and <table>
more specific:
soup.select('#ctl00_cph1_divSymbols table')`
To get and store the data you could iterat the rows and append results to a list:
data = []
for row in soup.select('#ctl00_cph1_divSymbols table tr:has(td)'):
d = dict(zip(soup.select_one('#ctl00_cph1_divSymbols table tr:has(th)').stripped_strings,row.stripped_strings))
d.update({'url': 'https://eoddata.com'+row.a.get('href')})
data.append(d)
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://eoddata.com/stocklist/NASDAQ/A.htm"
res = requests.get(url)
soup = BeautifulSoup(res.text)
data = []
for row in soup.select('#ctl00_cph1_divSymbols table tr:has(td)'):
d = dict(zip(soup.select_one('#ctl00_cph1_divSymbols table tr:has(th)').stripped_strings,row.stripped_strings))
d.update({'url': 'https://eoddata.com'+row.a.get('href')})
data.append(d)
pd.DataFrame(data)
Code | Name | High | Low | Close | Volume | Change | url | |
---|---|---|---|---|---|---|---|---|
0 | AACG | Ata Creativity Global ADR | 1.390 | 1.360 | 1.380 | 8,900 | 0 | https://eoddata.com/stockquote/NASDAQ/AACG.htm |
1 | AACI | Armada Acquisition Corp I | 9.895 | 9.880 | 9.880 | 5,400 | -0.001 | https://eoddata.com/stockquote/NASDAQ/AACI.htm |
2 | AACIU | Armada Acquisition Corp I | 9.960 | 9.960 | 9.960 | 300 | -0.01 | https://eoddata.com/stockquote/NASDAQ/AACIU.htm |
3 | AACIW | Armada Acquisition Corp I WT | 0.1900 | 0.1699 | 0.1700 | 36,400 | -0.0193 | https://eoddata.com/stockquote/NASDAQ/AACIW.htm |
4 | AADI | Aadi Biosciences Inc | 13.40 | 12.66 | 12.90 | 98,500 | -0.05 | https://eoddata.com/stockquote/NASDAQ/AADI.htm |
5 | AADR | Advisorshares Dorsey Wright ETF | 47.49 | 46.82 | 47.49 | 1,100 | 0.3 | https://eoddata.com/stockquote/NASDAQ/AADR.htm |
6 | AAL | American Airlines Gp | 14.44 | 13.70 | 14.31 | 45,193,100 | -0.46 | https://eoddata.com/stockquote/NASDAQ/AAL.htm |
...
Upvotes: 1