Reputation: 217
Below is a web scraper the successfully pulls roster information from a team's website and exports it into a CSV file. As you can see, each team website has a similar url pattern.
http://m.redsox.mlb.com/roster/
http://m.yankees.mlb.com/roster/
I am trying to create a loop that will loop through each team's website, scrape each player's roster information, and write it to a CSV file. At the beginning of my code, I created a dictionary of team names and formatted it to the url to request a page. This strategy worked, however, the code is only looping through the last page I list in my dictionary. Does anyone know how to alter this code so that it loops through all the pages in the team_list dictionary? Thanks in advance!
import requests
import csv
from bs4 import BeautifulSoup
team_list={'yankees','redsox'}
for team in team_list:
page = requests.get('http://m.{}.mlb.com/roster/'.format(team))
soup = BeautifulSoup(page.text, 'html.parser')
soup.find(class_='nav-tabset-container').decompose()
soup.find(class_='column secondary span-5 right').decompose()
roster = soup.find(class_='layout layout-roster')
names = [n.contents[0] for n in roster.find_all('a')]
ids = [n['href'].split('/')[2] for n in roster.find_all('a')]
number = [n.contents[0] for n in roster.find_all('td', index='0')]
handedness = [n.contents[0] for n in roster.find_all('td', index='3')]
height = [n.contents[0] for n in roster.find_all('td', index='4')]
weight = [n.contents[0] for n in roster.find_all('td', index='5')]
DOB = [n.contents[0] for n in roster.find_all('td', index='6')]
team = [soup.find('meta',property='og:site_name')['content']] * len(names)
with open('MLB_Active_Roster.csv', 'w', newline='') as fp:
f = csv.writer(fp)
f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])
f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))
Upvotes: 0
Views: 127
Reputation: 523
I believe that by replacing your dictionary with a list you should solve the issue:
import requests
import csv
import pandas as pd
from bs4 import BeautifulSoup
team_list=['yankees','redsox']
output = []
for team in team_list:
page = requests.get('http://m.{}.mlb.com/roster/'.format(team))
soup = BeautifulSoup(page.text, 'html.parser')
soup.find(class_='nav-tabset-container').decompose()
soup.find(class_='column secondary span-5 right').decompose()
roster = soup.find(class_='layout layout-roster')
names = [n.contents[0] for n in roster.find_all('a')]
ids = [n['href'].split('/')[2] for n in roster.find_all('a')]
number = [n.contents[0] for n in roster.find_all('td', index='0')]
handedness = [n.contents[0] for n in roster.find_all('td', index='3')]
height = [n.contents[0] for n in roster.find_all('td', index='4')]
weight = [n.contents[0] for n in roster.find_all('td', index='5')]
DOB = [n.contents[0] for n in roster.find_all('td', index='6')]
team = [soup.find('meta',property='og:site_name')['content']] * len(names)
output.append([names, ids, number, handedness, height, weight, DOB, team])
pd.DataFrame(data=output, columns=['Name','ID','Number','Hand','Height','Weight','DOB','Team']).tocsv('csvfilename.csv')
Upvotes: 1