Reputation: 35
I'm using beautifulsoup4
to scrape data from the lyrics.com website, specifically this link: https://www.lyrics.com/album/1447935.
From this block, I'm trying to extract both <a>
elements:
[<table class="tdata">
<colgroup>
<col style="width: 50px;"/>
<col style="width: 430px;"/>
<col style="width: 80px;"/>
<col style="width: 80px;"/>
</colgroup>
<thead>
<tr>
<th>#</th>
<th>Song</th>
<th>Duration</th>
<th> </th>
</tr>
</thead>
<tbody>
<tr>
<td class="tal qx">1</td>
<td class="tal qx">
<strong>
<a href="/lyric/15183453/Make+You+Feel+My+Love">Make You Feel My Love</a>
</strong>
</td>
<td class="tal qx">3:32</td>
<td class="tal vam rt">
</td></tr><tr><td class="tal qx">2</td>
<td class="tal qx">
<strong>
<a href="/lyric/15183454/Painting+Pictures">Painting Pictures</a>
</strong>
</td>
<td class="tal qx">3:33</td>
<td class="tal vam rt"> </td>
</tr>
</tbody>
</table>]
This is my code:
url = "http://www.lyrics.com" + album_url
page = r.get(url)
soup = bs(page.content, "html.parser")
songs = [a.get('href') for a in (table.find('a') for table in soup.findAll('table')) if a]
However, it's only returning the first <a>
:
['/lyric/15183453/Make+You+Feel+My+Love']
What could be wrong?
Edit: Thank you all for the answers! I upvoted but I don't have enough rep for it to show
Upvotes: 1
Views: 161
Reputation: 633
Other solutions work fine, however I prefer using good old selectors
from bs4 import BeautifulSoup as bs
import requests as req
page = req.get('https://www.lyrics.com/album/1447935')
soup = bs(page.content, 'html.parser')
links = soup.select('table.tdata a[href]')
print(links)
This will print
[<a href="/lyric/15183453/Make+You+Feel+My+Love">Make You Feel My Love</a>, <a href="/lyric/15183454/Painting+Pictures">Painting Pictures</a>]
If you aren't familiar with selectors, this will grab table
elements that has the class tdata
and then collect all the href
property on the a
elements
Upvotes: 1
Reputation: 844
This will work:
songs = [song['href'] for song in soup.select('table a')]
Output:
['/lyric/15183453/Make+You+Feel+My+Love', '/lyric/15183454/Painting+Pictures']
Upvotes: 1
Reputation: 35
Was able to make it work with:
for a in soup.findAll('a'):
if a.parent.name == 'strong':
if a.parent.parent.name == 'td':
print(a["href"])
Still not sure why the other method doesn't work, though, since I've used it elsewhere in my program with no issues.
Upvotes: 1