Flux Capacitor
Flux Capacitor

Reputation: 1231

Python+BeautifulSoup: scraping a particular table from a webpage

I'm trying to scrape a particular table from : this webpage

What I want to scrape is the stock information. The dates, company name, ratio and whether or not it is optionable.

Here's what I have so far:

from bs4 import BeautifulSoup
import urllib2

url = "http://biz.yahoo.com/c/s.html"
page = urllib2.urlopen(url) 
soup = BeautifulSoup(page.read())

alltables = soup.find_all('table')

This code gives me all the tables on the page (there is more than one).

1) I'm not sure how to identify the table that I need.

2) I'm not sure how to extract the info from that table into an array or list or some other data structure I can use for further analysis.

Upvotes: 1

Views: 2399

Answers (1)

alecxe
alecxe

Reputation: 474191

The markup is not exactly easy to scrape - there are no ids or specific class attributes that you can use to distinguish the tables from one another. What I would do in this case is to find a Payable header cell and find the first table parent:

header = soup.find("b", text="Payable")
table = header.find_parent("table")

Then, you can iterate over table rows skipping the first 2 - header and the row with the divider:

for row in table.find_all("tr")[2:]:
    print([cell.get_text(strip=True) for cell in row.find_all("td")])

And, you can transform it into a list of lists:

[[cell.get_text(strip=True) 
  for cell in row.find_all("td")]
 for row in table.find_all("tr")[2:]]

Upvotes: 5

Related Questions