hsel
hsel

Reputation: 163

Beautifulsoup text from tags inside of tags

I'm trying to get the Artist results and the Song results, however I'm unsure on how I would get both of he values I want. This is what I have:

albums = soup.find('div', attrs={'class':'panel-heading'}, text=re.compile('Artist results:'))
        for i in albums.find_all('a'):
            print (i)

What I want from the artist results: asap rocky, asap mob, asap ferg

Similarly, I want what I'm trying to get from song results is: ASAP by T.I, ASAP by Eric Bellinger etc.

<div class="panel">
    <div class="panel-heading"><b>Artist results:</b><br><small>[1-3 of 3 total <span class="text-lowercase">Artists</span> found]</small></div>
    <table class="table table-condensed">
        <tr><td class="text-left visitedlyr">
            1. <a href="https://www.azlyrics.com/a/asaprocky.html" target="_blank"><b>Asap Rocky</b></a></td>
        </tr>
        <tr><td class="text-left visitedlyr">
            2. <a href="https://www.azlyrics.com/a/asapmob.html" target="_blank"><b>Asap Mob</b></a></td>
        </tr>
        <tr><td class="text-left visitedlyr">
            3. <a href="https://www.azlyrics.com/a/asapferg.html" target="_blank"><b>Asap Ferg</b></a></td>
        </tr>
    </table>
</div>
<div class="panel">
    <div class="panel-heading"><b>Song results:</b><br><small>[1-5 of 454 total <span class="text-lowercase">Songs</span> found]</small></div>
    <table class="table table-condensed">
        <tr><td class="text-left visitedlyr">
            1. <a href="https://www.azlyrics.com/lyrics/ti/asap.html" target="_blank"><b>ASAP</b></a>  by <b>T.I.</b><br>
            <small>[Intro] <strong>Asap</strong>, <strong>asap</strong>, <strong>asap</strong> <strong>Asap</strong>, <strong>asap</strong>, <strong>asap</strong> Ay, ay, ay, ay, ay, you niggaz better exit <strong>Asap</strong>, <strong>asap</strong>, <strong>asap</strong>, <strong>asap</strong> Ay-s, ay-p, ay-s, ay-p <strong>Asap</strong>, <strong>asap</strong>, <strong>asap</strong>, <strong>asap</strong> Ay-s, ay-p, ay-s, ay-p <strong>Asap</strong>, <strong>asap</strong>, <strong>asap</strong>, <strong>asap</strong> A-s-a-p, A-S-A-P [Verse 1] I'm on my grind, grand h...</small></td>
        </tr>
        <tr><td class="text-left visitedlyr">
            2. <a href="https://www.azlyrics.com/lyrics/ericbellinger/asap.html" target="_blank"><b>ASAP</b></a>  by <b>Eric Bellinger</b><br>
            <small>ou say? Cause girl I need that <strong>asap</strong> [Hook] Girl I need that <strong>asap</strong>, <strong>asap</strong>, <strong>asap</strong>, I need Girl I need that <strong>asap</strong>, <strong>asap</strong>, <strong>asap</strong>, I need Girl I need that <strong>asap</strong>, <strong>asap</strong>, <strong>asap</strong>, baby I need Girl I'm tryina taste that, taste that, I need Girl I need that <strong>asap</strong>, <strong>asap</strong>, ...</small></td>
        </tr>

Upvotes: 1

Views: 87

Answers (2)

Keyur Potdar
Keyur Potdar

Reputation: 7238

Getting the Artists results is pretty easy. First, find the <b>Artist results:</b> tag using soup.find('b', text='Artist results:'). Then find the table which has the results using find_next('table').

artists_table = soup.find('b', text='Artist results:').find_next('table')
artists = [x.text for x in artists_table.find_all('a')]
print(artists)
# ['Asap Rocky', 'Asap Mob', 'Asap Ferg']

To get the song results, use the same approach to get the table. But, to get the text you want, you'll have to make some changes.

songs_table = soup.find('b', text='Song results:').find_next('table')
songs = [' by '.join(b.text for b in td.find_all('b')) for td in songs_table.find_all('td')]
print(songs)
# ['ASAP by T.I.', 'ASAP by Eric Bellinger']

Upvotes: 2

HimanshuGahlot
HimanshuGahlot

Reputation: 571

As far as I know, BeautifulSoup is only for Tags.

Here what you wanna achive

elements = bs.find_all('div',attrs={"class":"panel"})
for i in elements:
    if i.div.b.text == "Artist results:":
        artist_a_tag = i.find_all("a")
    if i.div.b.text == "Song results:":
        songs_a_tag = i
artist_results = [i.b.text for i in artist_a_tag]
songs_with_artist = re.findall(r"target=\"_blank\"><b>(.*?)<\/b><\/a>\s+by\s+<b>(.*?)<\/b><br", str(songs_a_tag), re.M|re.I|re.S)
songs_results = [" by ".join(i) for i in songs_with_artist]

Hope this solves your problem, if not then let me know

Upvotes: 0

Related Questions