Oli
Oli

Reputation: 1782

How to parse rss-feeds / xml in a shell script

I'd like to parse rss feeds and download podcasts on my ReadyNas which is running 24/7 anyway.

So I'm thinking about having a shell script checking periodically the feeds and spawning wget to download the files.

What is the best way to do the parsing?

Thanks!

Upvotes: 13

Views: 20268

Answers (5)

kenorb
kenorb

Reputation: 166467

I've wrote the following simple script for downloading XML from Amazon S3, so it would be useful for parsing different kind of XML files:

#!/bin/bash
#
# Download all files from the Amazon feed
#
# Usage:
#  ./dl_amazon_feed_files.sh http://example.s3.amazonaws.com/
# Note: Don't forget about slash at the end
#

wget -qO- "$1" | grep -o '<Key>[^<]*' | grep -o "[^>]*$" | xargs -I% -L1 wget -c "$1%"

This is similar approach to @leo answer.

Upvotes: 1

leo
leo

Reputation: 3749

Sometimes a simple one liner with shell standard commands can be enough for this:

 wget -q -O- "http://www.rss-specifications.com/rss-podcast.xml" | grep -o '<enclosure url="[^"]*' | grep -o '[^"]*$' | xargs wget -c

Sure this does not work in every case, but it's often good enough.

Upvotes: 24

Giacomo
Giacomo

Reputation: 11247

You can use xsltproc from libxml2 and write a simple xsl stylesheet that parses the rss and outputs a list of links.

Upvotes: 0

cddr
cddr

Reputation: 300

Do you have access to awk? Maybe you could use XMLGawk

Upvotes: 2

Oli
Oli

Reputation: 1782

I read about XMLStartlet here and there

But is there a port to ReadyNas NV+ available?

Upvotes: 1

Related Questions