Reputation: 557
I am trying to extract the title and description of a rss Feed , I have written following script to return all the title in the Feed , But its returning only the first Title from the xml:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | grep -E -o "<title>(.*)</title>" |sed -e 's,.*<title>\(.*\)</title>.*,\1,g' | less
How can I also find the description ?
Upvotes: 1
Views: 340
Reputation: 97918
First put each title and description on its own line. Here is an example:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | \
grep -E -o "<title>(.*)</title>" | \
sed -e 's,<\(title\|description\)>,\n<\1>,g' |
sed -n 's,.*<title>\(.*\)</title>.*,\1,gp'
For the description:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | \
grep -E -o "<title>(.*)</title>" | \
sed -e 's,<\(title\|description\)>,\n<\1>,g' | \
sed 's,<title>\([^<]*\)</title>,T:\1,' | \
sed 's,<description>\([^<]*\)</description>,D:\1,' | \
sed -n 's/[DT]://p'
Upvotes: 1
Reputation: 784938
You can use grep -P
:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null |\
grep -oP "<title>\K[\s\S]*?(?=</title>)"
Upvotes: 1
Reputation: 32189
You should use non-greedy match (.*?
) instead of greedy matching (.*
) to get all the titles:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | grep -E -o "<title>(.*?)</title>" |sed -e 's,.*<title>\(.*?\)</title>.*,\1,g' | less
Upvotes: 0