How to extract partial text from href using BeautifulSoup in Python

Question

Here's my code:

for item in data:

print(item.find_all('td')[2].find('a'))
print(item.find('span').text.strip())
print(item.find_all('td')[3].text)
print(item.find_all('td')[2].find(target="_blank").string.strip())

It prints this text below.

16-399. 

Perry v. Merit Systems Protection Bd.

04/17/17

16-399.

All I want from the href tag is this part: 16-399_3f14

How can I do that? Thanks.

Joe.Ingalls · Accepted Answer

You can use the find_all to pull the the anchor elements that have the href attribute and then parse the href values for the information that you are looking for.

from BeautifulSoup import BeautifulSoup

html = '''16-399. '''

soup = BeautifulSoup(html)

for a in soup.find_all('a', href=True):
    url = a['href'].split('/')
    print url[-1]

This should output the the string you are looking for.

16-399_3f14.pdf

How to extract partial text from href using BeautifulSoup in Python

Answers (1)

Related Questions