VK13
VK13

Reputation: 41

Mapping Apple podcast episode id to rss feed element

I'm trying to map Apple podcast's episode id to that specific podcast entry in RSS feed. Say I have the episode with the following link https://podcasts.apple.com/us/podcast/the-numberphile-podcast/id1441474794?i=1000475383420 so the podcast_id=1441474794 and episode_id=1000475383420. Now I'm able to get the RSS feed with podcast id through this code:

from urllib.request import urlopen
import json
import xmltodict

podcast_id = "1441474794"
ITUNES_URL = 'https://itunes.apple.com/lookup?id='
with urlopen(ITUNES_URL + podcast_id) as response:
    res = json.load(response)
    feedUrl = res['results'][0]['feedUrl']
    print(feedUrl)

with urlopen(feedUrl) as response:
    res = xmltodict.parse(response)

with open('res.json', "w") as f:
    f.write(json.dumps(res))

This gives me a JSON with some general info about the podcast and an array with all the episodes. For a specific episode the result looks like this:

"item": [
        {
          "title": "The Parker Quiz - with Matt Parker",
          "dc:creator": "Brady Haran",
          "pubDate": "Thu, 21 May 2020 16:59:08 +0000",
          "link": "https://www.numberphile.com/podcast/matt-parker-quiz",
          "guid": {
            "@isPermaLink": "false",
            "#text": "5b2cf993266c07b1ca7a812f:5bd2f1a04785d353e1b39d76:5ec683354f70a700f9f04555"
          },
          "description": "some description here...",
          "itunes:author": "Numberphile Podcast",
          "itunes:subtitle": "Matt Parker takes a quiz prepared by Brady. The YouTube version of this quiz contains a few visuals at https://youtu.be/hMwQwppzrys",
          "itunes:explicit": "no",
          "itunes:duration": "00:55:34",
          "itunes:image": {
            "@href": "https://images.squarespace-cdn.com/content/5b2cf993266c07b1ca7a812f/1541821254439-PW3116VHYDC1Y3V7GI0A/podcast_square2_2000x2000.jpg?format=1500w&content-type=image%2Fjpeg"
          },
          "itunes:title": "The Parker Quiz - with Matt Parker",
          "enclosure": {
            "@url": "https://traffic.libsyn.com/secure/numberphile/numberphile_parker_quiz.mp3",
            "@type": "audio/mpeg"
          },
          "media:content": {
            "@url": "https://traffic.libsyn.com/secure/numberphile/numberphile_parker_quiz.mp3",
            "@type": "audio/mpeg",
            "@isDefault": "true",
            "@medium": "audio",
            "media:title": {
              "@type": "plain",
              "#text": "The Parker Quiz - with Matt Parker"
            }
          }
        },
...]

The episode_id=1000475383420 doesn't appear anywhere in the RSS feed response so there is no way to find which episode corresponds to this id. Is there a clean way to find the episode by id? For example an Apple api call with episode id which will give me info about the episode and then I can match the info with RSS feed entry.

Upvotes: 0

Views: 648

Answers (2)

Julian H
Julian H

Reputation: 1859

Yeah the second response is a general-purpose podcast rss feed, independent of Apple or other sources. I'd not expect it ever to have Apple / podcast player-specific results.

Best I've been able to do is do a title match based on json-ld metadata on the podcsat episode html page. json-ld data is semantic data (vs presentation) so much less likely to change. I use the extruct library for some semblance of hope of extracting meaningful metadata and jsonpath_rw for parsing json text (amazing library)

import extruct 
from jsonpath_rw import parse

metadata = extruct.extract(itunes_podcast_episode_html, uniform=True)
title_pattern = "[json-ld][*]['name']"

expr = parse(title_pattern)
title = [match.value for match in expr.find(metadata)][0]
print(f"itunes podcast episode name = '{title}'")

Upvotes: 0

Danoz
Danoz

Reputation: 1087

The element/tag that is supposed to uniquely identify an episode in a podcast RSS feed is:

<guid>

Here is some related info from the Apple Podcasts Connect Guide to RSS that might be helpful.

If you can get a hold of the <guid> then you can access the episode from the feed.

A less reliable option would be to try the <link> tag for the episode. On that URL that you provided, there is a link down toward the end of the page that is named 'Episode Website'

enter image description here

That may also get you a unique key to the episode in the RSS feed. But it may not work as you would expect in all cases. i.e. say the creator/publisher of the podcast RSS simply just put the same URL in each episode instead of a unique URL per episode.

Upvotes: 0

Related Questions