Reputation: 53
I'm trying to obtain specific URLs from a website to save them in an array.
The problem is that, I can't figure it out how to search for the specific links.
From this whole website I want to obtain only the href=/pubmed/...
Here is my piece of code until now:
from bs4 import BeautifulSoup
url="https://www.ncbi.nlm.nih.gov/pubmed/?term=John+B.+Goodenough"
soup = BeautifulSoup(response.content, 'lxml')
for link in soup.find_all('a'):
print(link.get('href'))
But when I run the code above I obtain all the links and not only the specific ones that I want.
Upvotes: 0
Views: 27
Reputation: 581
Try filtering for only links with the "pubmed" substring.
Try replacing your for loop with the following:
for link in soup.find_all('a'):
if link.get('href').find("pubmed") > 0:
print(link.get('href'))
Upvotes: 1