web2dev
web2dev

Reputation: 557

Grep and sed returning only first match

I am trying to extract the title and description of a rss Feed , I have written following script to return all the title in the Feed , But its returning only the first Title from the xml:

curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null  | grep -E -o "<title>(.*)</title>" |sed -e 's,.*<title>\(.*\)</title>.*,\1,g' | less

How can I also find the description ?

Upvotes: 1

Views: 340

Answers (3)

perreal
perreal

Reputation: 97918

First put each title and description on its own line. Here is an example:

curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null  | \
      grep -E -o "<title>(.*)</title>" | \
      sed -e 's,<\(title\|description\)>,\n<\1>,g' | 
      sed -n 's,.*<title>\(.*\)</title>.*,\1,gp'

For the description:

curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null  | \
      grep -E -o "<title>(.*)</title>" | \
      sed -e 's,<\(title\|description\)>,\n<\1>,g' | \
      sed 's,<title>\([^<]*\)</title>,T:\1,' | \
      sed 's,<description>\([^<]*\)</description>,D:\1,' | \
      sed -n 's/[DT]://p'

Upvotes: 1

anubhava
anubhava

Reputation: 784938

You can use grep -P:

curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null |\
      grep -oP "<title>\K[\s\S]*?(?=</title>)"

Upvotes: 1

sshashank124
sshashank124

Reputation: 32189

You should use non-greedy match (.*?) instead of greedy matching (.*) to get all the titles:

curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null  | grep -E -o "<title>(.*?)</title>" |sed -e 's,.*<title>\(.*?\)</title>.*,\1,g' | less

Upvotes: 0

Related Questions