Oasis101
Oasis101

Reputation: 31

How to extract the links using BeautifulSoup

How do I extract the link in the following html:

<li><i class="fas fa-file-pdf"></i> <a href="https://manoa.hawaii.edu/miro/wp-content/uploads/2022/03/AnalysisBrief_CommonDataSet_2021.docx.pdf" rel="noopener" target="_blank">2021-2022 Common Data Set</a></li>

Upvotes: 0

Views: 131

Answers (1)

HedgeHog
HedgeHog

Reputation: 25048

Use list comprehension and css selectors to get a list of links - Select all links that ends with .pdf:

[a['href'] for a in soup.select('a[href$=".pdf"]')]

or more specific <a> with href as sibling of the <i> with class fa-file-pdf:

[a['href'] for a in soup.select('li i.fa-file-pdf + a[href]')]

So if the goal is to extract only the first:

link = [a['href'] for a in soup.select('a[href$=".pdf"]')][0]

or

link = soup.select_one('a[href$=".pdf"]')['href']

Example

from bs4 import BeautifulSoup
import requests

html = '''
<li><i class="fas fa-file-pdf"></i> <a href="https://manoa.hawaii.edu/miro/wp-content/uploads/2022/03/AnalysisBrief_CommonDataSet_2021.docx.pdf" rel="noopener" target="_blank">2021-2022 Common Data Set</a></li>
<li><i class="fas fa-file-pdf"></i> <a href="https://manoa.hawaii.edu/miro/wp-content/uploads/2022/03/AnalysisBrief_CommonDataSet_2021.docx.pdf" rel="noopener" target="_blank">2021-2022 Common Data Set</a></li>
<li><i class="fas fa-file-pdf"></i> <a href="https://manoa.hawaii.edu/miro/wp-content/uploads/2022/03/AnalysisBrief_CommonDataSet_2021.docx.pdf" rel="noopener" target="_blank">2021-2022 Common Data Set</a></li>
<li><i class="fas fa-file-pdf"></i> <a href="https://manoa.hawaii.edu/miro/wp-content/uploads/2022/03/AnalysisBrief_CommonDataSet_2021.docx.pdf" rel="noopener" target="_blank">2021-2022 Common Data Set</a></li>
'''
soup = BeautifulSoup(html)

urlList = [a['href'] for a in soup.select('a[href$=".pdf"]')]

Output

['https://manoa.hawaii.edu/miro/wp-content/uploads/2022/03/AnalysisBrief_CommonDataSet_2021.docx.pdf',
 'https://manoa.hawaii.edu/miro/wp-content/uploads/2022/03/AnalysisBrief_CommonDataSet_2021.docx.pdf',
 'https://manoa.hawaii.edu/miro/wp-content/uploads/2022/03/AnalysisBrief_CommonDataSet_2021.docx.pdf',
 'https://manoa.hawaii.edu/miro/wp-content/uploads/2022/03/AnalysisBrief_CommonDataSet_2021.docx.pdf']

Upvotes: 1

Related Questions