Reputation: 59
Good Morning / Afternoon
I would be grateful for your help to give further clarity and direction using Beautiful Soup. My end goal if to have three variables
I can currently extract the artist and song together (concatenated) and the radio station. This is in the format {str} \n\n artist song\n track hit \n
The code I have used below returns 20 tags which I can then use to extract the information above. My question to you all is - how can I retrieve the text from the raw code below using <div class = "table__track-title" <div class = "table__track-on air"
these would be combined with <tr class = "now-playing-tr" which was used initially to retrieve the result set of 20 records.
Code used to obtain the HTML subset import bs4,requests,re
siteurl = 'https://onlineradiobox.com/uk/?cs=uk'
r = requests.get(siteurl)
soup = bs4.BeautifulSoup(r.text,'html.parser')
x = soup.find_all(class_='now_playing_tr')
for i in range(len(x)):
currtext = x[i].get_text()
HTML <button aria-label="Listen live" class="b-play station_play" radioid="uk.cheekyhits"
radioimg="//cdn.onlineradiobox.com/img/l/2/88872.v2.png" radioname="Cheeky Hits"
stream="https://stream.zeno.fm/ys9fvbebgwzuv"
streamtype="mp3" title="Listen to radio"></button>
<img alt="Billy Joel - The DownEaster Alexa" src="https://is2-
ssl.mzstatic.com/image/thumb/Music124/v4/bf/b7/db/bfb7dbd8-6d55-d42a-a0f0-
3ecc5681cf9c/20UMGIM12176.rgb.jpg/30x30bb.jpg"><div class="table__track-title"><b>Billy
Joel</b> The DownEaster Alexa</div>
<div class="table__track-onair">Cheeky Hits</div>
</img></td></tr>
Upvotes: 2
Views: 237
Reputation: 20052
The main problem here is that not all stations have a track on air, so you have to account for a missing tag.
Here's my take on it:
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
soup = BeautifulSoup(
requests.get("https://onlineradiobox.com/uk/?cs=uk").text,
"lxml",
)
def check_for_track_on_air(tag) -> list:
if tag:
return tag.getText(strip=True).split(" - ")
return ["N/A", "N/A"]
stations = [
[
station.select_one(
".station__title__name, .podcast__title__name"
).getText(),
*check_for_track_on_air(
station.select_one(
".stations__station__track, .podcasts__podcast__track"
)
),
] for station
in soup.find_all("li", class_="stations__station")
]
station_chart = tabulate(
stations,
tablefmt="pretty",
headers=["Station", "Artist", "Song"],
stralign="left",
)
print(station_chart)
Output:
+-------------------------------+----------------------------+---------------------------------------+
| Station | Artist | Song |
+-------------------------------+----------------------------+---------------------------------------+
| Smooth Radio | Mark Cohn | Walking In Memphis |
| BBC Radio 1 | Hazey | Pots & Potions |
| Capital FM | Belters Only feat. Jazzy | Make Me Feel Good |
| Heart FM | Maroon 5 | This Love |
| Classic FM | Gioachino Rossini | La Cenerentola |
| BBC Radio London | Naughty Boy | La La La (feat. Sam Smith) |
| BBC Radio 2 | Simple Minds | Act Of Love |
| BBC Radio 4 | Nina Simone | Baltimore |
| Dance UK Radio | Faithless vs David Guetta | God Is A DJ |
| Gold Radio | Manfred Mann's Earthband | Davy's On The Road Again |
| KISS FM | Muni Long | Hrs & Hrs |
| LBC | N/A | N/A |
| Energy FM - Dance Music Radio | C-Sixty Four | On A Good Thing (Full Intention Edit) |
| Radio Caroline | Badly Drawn Boy | Once Around The Block |
| BBC Radio 6 Music | Bob Marley & The Wailers | It Hurts to Be Alone |
| BBC Radio 5 live | N/A | N/A |
| Absolute Chillout | N/A | N/A |
| House Nation UK | Kevin McKay, Katie McHardy | Everywhere (Extended Mix) |
| BBC Radio 4 Extra | N/A | N/A |
| Absolute Radio | The Knack | My Sharona |
| Magic Radio | Elton John | Rocket Man |
| Soul Groove Radio | N/A | N/A |
| BBC Radio 3 | Darius Milhaud | Violin Sonata, Op.257 |
| Jazz FM | N/A | N/A |
| BBC Radio 1Xtra | M'Way | Run It Up |
+-------------------------------+----------------------------+---------------------------------------+
Upvotes: 1