Nicolas
Nicolas

Reputation: 53

Obtain specific href from a web page

I'm trying to obtain specific URLs from a website to save them in an array.

The problem is that, I can't figure it out how to search for the specific links.

screen shot from the website and specific href that I'm looking for

From this whole website I want to obtain only the href=/pubmed/...

Here is my piece of code until now:

from bs4 import BeautifulSoup

url="https://www.ncbi.nlm.nih.gov/pubmed/?term=John+B.+Goodenough"
soup = BeautifulSoup(response.content, 'lxml')

for link in soup.find_all('a'):
        print(link.get('href'))

But when I run the code above I obtain all the links and not only the specific ones that I want.

Upvotes: 0

Views: 27

Answers (1)

Jeff Huang
Jeff Huang

Reputation: 581

Try filtering for only links with the "pubmed" substring.

Try replacing your for loop with the following:

for link in soup.find_all('a'):
    if link.get('href').find("pubmed") > 0:
        print(link.get('href'))

Upvotes: 1

Related Questions