Reputation: 41
I'm trying to map Apple podcast's episode id to that specific podcast entry in RSS feed. Say I have the episode with the following link https://podcasts.apple.com/us/podcast/the-numberphile-podcast/id1441474794?i=1000475383420 so the podcast_id=1441474794
and episode_id=1000475383420
. Now I'm able to get the RSS feed with podcast id through this code:
from urllib.request import urlopen
import json
import xmltodict
podcast_id = "1441474794"
ITUNES_URL = 'https://itunes.apple.com/lookup?id='
with urlopen(ITUNES_URL + podcast_id) as response:
res = json.load(response)
feedUrl = res['results'][0]['feedUrl']
print(feedUrl)
with urlopen(feedUrl) as response:
res = xmltodict.parse(response)
with open('res.json', "w") as f:
f.write(json.dumps(res))
This gives me a JSON with some general info about the podcast and an array with all the episodes. For a specific episode the result looks like this:
"item": [
{
"title": "The Parker Quiz - with Matt Parker",
"dc:creator": "Brady Haran",
"pubDate": "Thu, 21 May 2020 16:59:08 +0000",
"link": "https://www.numberphile.com/podcast/matt-parker-quiz",
"guid": {
"@isPermaLink": "false",
"#text": "5b2cf993266c07b1ca7a812f:5bd2f1a04785d353e1b39d76:5ec683354f70a700f9f04555"
},
"description": "some description here...",
"itunes:author": "Numberphile Podcast",
"itunes:subtitle": "Matt Parker takes a quiz prepared by Brady. The YouTube version of this quiz contains a few visuals at https://youtu.be/hMwQwppzrys",
"itunes:explicit": "no",
"itunes:duration": "00:55:34",
"itunes:image": {
"@href": "https://images.squarespace-cdn.com/content/5b2cf993266c07b1ca7a812f/1541821254439-PW3116VHYDC1Y3V7GI0A/podcast_square2_2000x2000.jpg?format=1500w&content-type=image%2Fjpeg"
},
"itunes:title": "The Parker Quiz - with Matt Parker",
"enclosure": {
"@url": "https://traffic.libsyn.com/secure/numberphile/numberphile_parker_quiz.mp3",
"@type": "audio/mpeg"
},
"media:content": {
"@url": "https://traffic.libsyn.com/secure/numberphile/numberphile_parker_quiz.mp3",
"@type": "audio/mpeg",
"@isDefault": "true",
"@medium": "audio",
"media:title": {
"@type": "plain",
"#text": "The Parker Quiz - with Matt Parker"
}
}
},
...]
The episode_id=1000475383420
doesn't appear anywhere in the RSS feed response so there is no way to find which episode corresponds to this id. Is there a clean way to find the episode by id? For example an Apple api call with episode id which will give me info about the episode and then I can match the info with RSS feed entry.
Upvotes: 0
Views: 648
Reputation: 1859
Yeah the second response is a general-purpose podcast rss feed, independent of Apple or other sources. I'd not expect it ever to have Apple / podcast player-specific results.
Best I've been able to do is do a title match based on json-ld metadata on the podcsat episode html page. json-ld data is semantic data (vs presentation) so much less likely to change. I use the extruct
library for some semblance of hope of extracting meaningful metadata and jsonpath_rw
for parsing json text (amazing library)
import extruct
from jsonpath_rw import parse
metadata = extruct.extract(itunes_podcast_episode_html, uniform=True)
title_pattern = "[json-ld][*]['name']"
expr = parse(title_pattern)
title = [match.value for match in expr.find(metadata)][0]
print(f"itunes podcast episode name = '{title}'")
Upvotes: 0
Reputation: 1087
The element/tag that is supposed to uniquely identify an episode in a podcast RSS feed is:
<guid>
Here is some related info from the Apple Podcasts Connect Guide to RSS that might be helpful.
If you can get a hold of the <guid>
then you can access the episode from the feed.
A less reliable option would be to try the <link>
tag for the episode. On that URL that you provided, there is a link down toward the end of the page that is named 'Episode Website'
That may also get you a unique key to the episode in the RSS feed. But it may not work as you would expect in all cases. i.e. say the creator/publisher of the podcast RSS simply just put the same URL in each episode instead of a unique URL per episode.
Upvotes: 0