Reputation: 125
Here's my code:
for item in data:
print(item.find_all('td')[2].find('a'))
print(item.find('span').text.strip())
print(item.find_all('td')[3].text)
print(item.find_all('td')[2].find(target="_blank").string.strip())
It prints this text below.
<a href="argument_transcripts/2016/16-399_3f14.pdf"
id="ctl00_ctl00_MainEditable_mainContent_rptTranscript_ctl01_hypFile"
target="_blank">16-399. </a>
Perry v. Merit Systems Protection Bd.
04/17/17
16-399.
All I want from the href tag is this part: 16-399_3f14
How can I do that? Thanks.
Upvotes: 1
Views: 1864
Reputation: 196
You can use the find_all to pull the the anchor elements that have the href attribute and then parse the href values for the information that you are looking for.
from BeautifulSoup import BeautifulSoup
html = '''<a href="argument_transcripts/2016/16-399_3f14.pdf"
id="ctl00_ctl00_MainEditable_mainContent_rptTranscript_ctl01_hypFile"
target="_blank">16-399. </a>'''
soup = BeautifulSoup(html)
for a in soup.find_all('a', href=True):
url = a['href'].split('/')
print url[-1]
This should output the the string you are looking for.
16-399_3f14.pdf
Upvotes: 1