Obtain specific href from a web page

Question

I'm trying to obtain specific URLs from a website to save them in an array.

The problem is that, I can't figure it out how to search for the specific links.

From this whole website I want to obtain only the href=/pubmed/...

Here is my piece of code until now:

from bs4 import BeautifulSoup

url="https://www.ncbi.nlm.nih.gov/pubmed/?term=John+B.+Goodenough"
soup = BeautifulSoup(response.content, 'lxml')

for link in soup.find_all('a'):
        print(link.get('href'))

But when I run the code above I obtain all the links and not only the specific ones that I want.

Jeff Huang · Accepted Answer

Try filtering for only links with the "pubmed" substring.

Try replacing your for loop with the following:

for link in soup.find_all('a'):
    if link.get('href').find("pubmed") > 0:
        print(link.get('href'))

Obtain specific href from a web page

Answers (1)

Related Questions