user131983
user131983

Reputation: 3937

Figuring out how to web scrape with BeautifulSoup

I am trying to scrape the data in the Table with "Periods" and "percent per annum" (Table 4) as Columns in the URL:

My code is as follows, but I think I am getting confused as to how to refer to the row just above the first date and corresponding number and hence get the error AttributeError: 'NoneType' object has no attribute 'getText' in the line row_name = row.findNext('td.header_units').getText().

from bs4 import BeautifulSoup
import urllib2 

url = "http://sdw.ecb.europa.eu/browseTable.do?node=qview&SERIES_KEY=165.YC.B.U2.EUR.4F.G_N_A.SV_C_YM.SR_30Y"

content = urllib2.urlopen(url).read()
soup = BeautifulSoup(content)

desired_table = soup.findAll('table')[4]

# Find the columns you want data from
headers1 = desired_table.findAll('td.header_units')
headers2 = desired_table.findAll('td.header')
desired_columns = []
for th in headers1: #I'm just working with `headers1` currently to see if I have the right idea
    desired_columns.append([headers1.index(th), th.getText()])

# Iterate through each row grabbing the data from the desired columns
rows = desired_table.findAll('tr')

for row in rows[1:]:
    cells = row.findAll('td')
    row_name = row.findNext('td.header_units').getText()
    for column in desired_columns:
        print(cells[column[0]].text.encode('ascii', 'ignore'), row_name.encode('ascii', 'ignore'), column[1].encode('ascii', 'ignore'))

Thank You

Upvotes: 2

Views: 64

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

This will put all the elements in tuples as pairs:

from bs4 import BeautifulSoup
import requests

r = requests.get(
    "http://sdw.ecb.europa.eu/browseTable.do?node=qview&SERIES_KEY=165.YC.B.U2.EUR.4F.G_N_A.SV_C_YM.SR_30Y")
soup = BeautifulSoup(r.content)

data = iter(soup.find("table", {"class": "tablestats"}).find("td", {"class": "header"}).find_all_next("tr"))


headers = (next(data).text, next(data).text)
table_items =  [(a.text, b.text) for ele in data for a, b in [ele.find_all("td")]]

for a, b in table_items:
    print(u"Period={}, Percent per annum={}".format(a, b if b.strip() else "null"))

Output:

Period=2015-06-09, Percent per annum=1.842026
Period=2015-06-08, Percent per annum=1.741636
Period=2015-06-07, Percent per annum=null
Period=2015-06-06, Percent per annum=null
Period=2015-06-05, Percent per annum=1.700042
Period=2015-06-04, Percent per annum=1.667431

Upvotes: 1

Related Questions