Extracting a Specific Part of a Link Using Beautiful Soup

Question

Below is a section of my web scraper that scrapes a team roster from this website, puts the player information into an array, and exports the arrays to columns in a CSV file. My scraper works fine, but I would like to also pull the player's ID number, which is nested inside the player's ahref link.

Matt Andriese

As you can see from my code, I am already searching for ('a') to extract the player name (Matt Andriese), but I also want to extract the playerid number nested within the link (542882). Does anyone know how to solve this problem? Thanks in advance!

import requests
import csv
from bs4 import BeautifulSoup

page = requests.get('http://m.rays.mlb.com/roster/')
soup = BeautifulSoup(page.text, 'html.parser')

soup.find(class_='nav-tabset-container').decompose()
soup.find(class_='column secondary span-5 right').decompose()

roster = soup.find(class_='layout layout-roster')
names = [n.contents[0] for n in roster.find_all('a')]
number = [n.contents[0] for n in roster.find_all('td', index='0')]
handedness = [n.contents[0] for n in roster.find_all('td', index='3')]
height = [n.contents[0] for n in roster.find_all('td', index='4')]
weight = [n.contents[0] for n in roster.find_all('td', index='5')]
DOB = [n.contents[0] for n in roster.find_all('td', index='6')]
team = [soup.find('meta',property='og:site_name')['content']] * len(names)

with open('MLB_Active_Roster.csv', 'w', newline='') as fp:
    f = csv.writer(fp)
    f.writerow(['Name','Number','Hand','Height','Weight','DOB','Team'])
    f.writerows(zip(names, number, handedness, height, weight, DOB, team))

Sufian Latif · Accepted Answer

If link is the object corresponding to the tag, then you can get the href value as link['href']. Just to be safe, you might need to make sure there is an href attribute in the tag by checking if 'href' in link. After you get the URL, split it by /s.

In your case, you could do something like this:

ids = [n['href'].split('/')[2] for n in roster.find_all('a')]

Extracting a Specific Part of a Link Using Beautiful Soup

Answers (2)

Related Questions