sxs
sxs

Reputation: 59

How can I extract text from a class within a class using Beautiful Soup in Python

Good Morning / Afternoon

I would be grateful for your help to give further clarity and direction using Beautiful Soup. My end goal if to have three variables

I can currently extract the artist and song together (concatenated) and the radio station. This is in the format {str} \n\n artist song\n track hit \n

The code I have used below returns 20 tags which I can then use to extract the information above. My question to you all is - how can I retrieve the text from the raw code below using <div class = "table__track-title" <div class = "table__track-on air"

these would be combined with <tr class = "now-playing-tr" which was used initially to retrieve the result set of 20 records.

Code used to obtain the HTML subset import bs4,requests,re

siteurl = 'https://onlineradiobox.com/uk/?cs=uk'
r = requests.get(siteurl)
soup = bs4.BeautifulSoup(r.text,'html.parser')

x = soup.find_all(class_='now_playing_tr')

for i in range(len(x)):
    currtext = x[i].get_text()

HTML <button aria-label="Listen live" class="b-play station_play" radioid="uk.cheekyhits"

radioimg="//cdn.onlineradiobox.com/img/l/2/88872.v2.png" radioname="Cheeky Hits" 
stream="https://stream.zeno.fm/ys9fvbebgwzuv" 

streamtype="mp3" title="Listen to radio"></button>
<img alt="Billy Joel - The DownEaster Alexa" src="https://is2- 
ssl.mzstatic.com/image/thumb/Music124/v4/bf/b7/db/bfb7dbd8-6d55-d42a-a0f0- 
3ecc5681cf9c/20UMGIM12176.rgb.jpg/30x30bb.jpg"><div class="table__track-title"><b>Billy 
Joel</b> The DownEaster Alexa</div>
<div class="table__track-onair">Cheeky Hits</div>
</img></td></tr>

Upvotes: 2

Views: 237

Answers (1)

baduker
baduker

Reputation: 20052

The main problem here is that not all stations have a track on air, so you have to account for a missing tag.

Here's my take on it:

import requests

from bs4 import BeautifulSoup
from tabulate import tabulate

soup = BeautifulSoup(
    requests.get("https://onlineradiobox.com/uk/?cs=uk").text,
    "lxml",
)


def check_for_track_on_air(tag) -> list:
    if tag:
        return tag.getText(strip=True).split(" - ")
    return ["N/A", "N/A"]


stations = [
    [
        station.select_one(
            ".station__title__name, .podcast__title__name"
        ).getText(),
        *check_for_track_on_air(
            station.select_one(
                ".stations__station__track, .podcasts__podcast__track"
            )
        ),
    ] for station
    in soup.find_all("li", class_="stations__station")
]

station_chart = tabulate(
    stations,
    tablefmt="pretty",
    headers=["Station", "Artist", "Song"],
    stralign="left",
)
print(station_chart)

Output:

+-------------------------------+----------------------------+---------------------------------------+
| Station                       | Artist                     | Song                                  |
+-------------------------------+----------------------------+---------------------------------------+
| Smooth Radio                  | Mark Cohn                  | Walking In Memphis                    |
| BBC Radio 1                   | Hazey                      | Pots & Potions                        |
| Capital FM                    | Belters Only feat. Jazzy   | Make Me Feel Good                     |
| Heart FM                      | Maroon 5                   | This Love                             |
| Classic FM                    | Gioachino Rossini          | La Cenerentola                        |
| BBC Radio London              | Naughty Boy                | La La La (feat. Sam Smith)            |
| BBC Radio 2                   | Simple Minds               | Act Of Love                           |
| BBC Radio 4                   | Nina Simone                | Baltimore                             |
| Dance UK Radio                | Faithless vs David Guetta  | God Is A DJ                           |
| Gold Radio                    | Manfred Mann's Earthband   | Davy's On The Road Again              |
| KISS FM                       | Muni Long                  | Hrs & Hrs                             |
| LBC                           | N/A                        | N/A                                   |
| Energy FM - Dance Music Radio | C-Sixty Four               | On A Good Thing (Full Intention Edit) |
| Radio Caroline                | Badly Drawn Boy            | Once Around The Block                 |
| BBC Radio 6 Music             | Bob Marley & The Wailers   | It Hurts to Be Alone                  |
| BBC Radio 5 live              | N/A                        | N/A                                   |
| Absolute Chillout             | N/A                        | N/A                                   |
| House Nation UK               | Kevin McKay, Katie McHardy | Everywhere (Extended Mix)             |
| BBC Radio 4 Extra             | N/A                        | N/A                                   |
| Absolute Radio                | The Knack                  | My Sharona                            |
| Magic Radio                   | Elton John                 | Rocket Man                            |
| Soul Groove Radio             | N/A                        | N/A                                   |
| BBC Radio 3                   | Darius Milhaud             | Violin Sonata, Op.257                 |
| Jazz FM                       | N/A                        | N/A                                   |
| BBC Radio 1Xtra               | M'Way                      | Run It Up                             |
+-------------------------------+----------------------------+---------------------------------------+

Upvotes: 1

Related Questions