Justin Opolony
Justin Opolony

Reputation: 101

How to detect updates in podcast feeds?

I have a large set of podcast feed URLs which I'm periodically polling to check for updates. I'm really struggling to find a robust way to detect if a feed has changed that doesn't have any false positives. I'd like to be able to detect not just if there is a new episode, but also if an existing episode was updated.

RSS and Atom feeds provide pubDate, lastBuildDate or updated elements. However, I'm finding these frequently misused so that the feed is actually inserting the current date time into these fields each request. This makes them difficult to rely on to detect changes.

My next thought was to strip all date information from the podcasts, then MD5 hash the feed contents. I can then compare the feed hashes to detect changes to the feeds.

This seems to work for about 90% of the cases. However, there are still hundreds of podcasts that insert dynamic data into their feeds.

One podcast has the following as their podcast cover art:

http://erikglassman.hipcast.com/albumart/1000.1439649026.jpg

Where 1439649026 is what I assume is a timestamp. This second number changes with each request of their feed.

This is starting to seem like a losing battle. If I can't reliably trust the date fields of a podcast feed, and if some percentage of podcasts insert dynamic data into their feed text, how can I reliably detect changes to a feed in a robust way?

Upvotes: 4

Views: 590

Answers (1)

Dave Winer
Dave Winer

Reputation: 1917

Everything you say is true, so it's not a good idea to try to detect changes at the feed level, instead look for them at the item level.

That generally works, if it doesn't the feed can't be used by anyone, so the source of the feed is likely to have fixed any problem. That's why I think it works so well.

I've been writing feed readers as long as they have existed, my current product is called River4, it's available as open source, MIT License, so you can use it as example code, for this and other issues.

This is where it checks if an item is new:

https://github.com/scripting/river4/blob/master/river4.js#L1411

That might move around as the code changes, so look for a routine called getItemGuid. It shows you how to get a value that uniquely identifies the item. I use this code for my podcatcher, http://podcatch.com/, and it seems to catch the new items, and doesn't get false positives.

Hope this helps! :-)

Upvotes: 3

Related Questions