Reputation: 23
I'm trying to loop through an array of URLs and scrape board members from a list of companies. There seems to be a problem with my loop below, where it's only running the first element in the array and duplicating results. Any help with this would be appreciated. Code:
from bs4 import BeautifulSoup
import requests
#array of URLs to loop through, will be larger once I get the loop working correctly
tickers = ['http://www.reuters.com/finance/stocks/companyOfficers?symbol=AAPL.O', 'http://www.reuters.com/finance/stocks/companyOfficers?symbol=GOOG.O']
board_members = []
output = []
soup = BeautifulSoup(html, "html.parser")
for t in tickers:
html = requests.get(t).text
officer_table = soup.find('table', {"class" : "dataTable"})
for row in officer_table.find_all('tr'):
cols = row.find_all('td')
if len(cols) == 4:
board_members.append((t, cols[0].text.strip(), cols[1].text.strip(), cols[2].text.strip(), cols[3].text.strip()))
for t, name, age, year_joined, position in board_members:
output.append(('{} {:35} {} {} {}'.format(t, name, age, year_joined, position)))
Upvotes: 1
Views: 2241
Reputation: 12158
soup = BeautifulSoup(html, "html.parser")
for t in tickers:
html = requests.get(t).text
officer_table = soup.find('table', {"class" : "dataTable"})
you put soup out of the for loop, this will cause a error, because the 'html' dose not exist when you use BeautifulSoup(html, "html.parser")
just put it in the loop after html is assigned.
for t in tickers:
html = requests.get(t).text
soup = BeautifulSoup(html, "html.parser")
officer_table = soup.find('table', {"class" : "dataTable"})
Upvotes: 1