Reputation: 1231
I'm trying to scrape a particular table from : this webpage
What I want to scrape is the stock information. The dates, company name, ratio and whether or not it is optionable.
Here's what I have so far:
from bs4 import BeautifulSoup
import urllib2
url = "http://biz.yahoo.com/c/s.html"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
alltables = soup.find_all('table')
This code gives me all the tables on the page (there is more than one).
1) I'm not sure how to identify the table that I need.
2) I'm not sure how to extract the info from that table into an array or list or some other data structure I can use for further analysis.
Upvotes: 1
Views: 2399
Reputation: 474191
The markup is not exactly easy to scrape - there are no id
s or specific class
attributes that you can use to distinguish the tables from one another. What I would do in this case is to find a Payable
header cell and find the first table
parent:
header = soup.find("b", text="Payable")
table = header.find_parent("table")
Then, you can iterate over table rows skipping the first 2 - header and the row with the divider:
for row in table.find_all("tr")[2:]:
print([cell.get_text(strip=True) for cell in row.find_all("td")])
And, you can transform it into a list of lists:
[[cell.get_text(strip=True)
for cell in row.find_all("td")]
for row in table.find_all("tr")[2:]]
Upvotes: 5