Reputation: 75
I am scraping from this page: https://www.pro-football-reference.com/years/2018/week_1.htm
It is a list of game scores for American Football. I want to open the link to the stats for the first game. The text displayed for said says "Final". My code so far...
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
#assigning url
my_url = "https://www.pro-football-reference.com/years/2018/week_1.htm"
# opening up connection, grabbing the page
raw_page = uReq(my_url)
page_html = raw_page.read()
raw_page.close()
# html parsing
page_soup = soup(page_html,"html.parser")
#find all games on page
games = page_soup.findAll("div",{"class":"game_summary expanded nohover"})
link = games[0].find("td",{"class":"right gamelink"})
print(link)
When I run this i receive the following output...
<a href="/boxscores/201809060phi.htm">Final</a>
How do I assign only the link text (i.e. "/boxscores/201809060phi.htm") to a variable?
Upvotes: 0
Views: 41
Reputation: 310
link = games[0].find("td",{"class":"right gamelink"}).find('a')
print(link['href'])
Upvotes: 1