Tom D
Tom D

Reputation: 41

How to manipulate text with awk?

How would I be able to manipulate the output text of grep.

Right now I am using the command:

grep -i "<url>" $file  >> ./txtFiles/$file.txt

This would output something like this:

<url>http://www.simplyrecipes.com/recipes/chicken_curry_salad/</url>

and then the next text will go to the next line.

How would I be able to get rid of the <url> and </url> and stop it from going to the next line at the end.

Upvotes: 0

Views: 126

Answers (2)

Zombo
Zombo

Reputation: 1

sed '/<\/*url>/!d;s///g'
  • <\/*url> matches both start and end tag
  • Delete lines that don't have this
  • Then remove all cases of this pattern

With your example, it might look like this

sed '/<\/*url>/!d;s///g' $file >> ./txtFiles/$file.txt

Upvotes: 2

Zsolt Botykai
Zsolt Botykai

Reputation: 51593

Single commands:

sed -in '/<url>/ { s|<url>\(.*\)</url>|\1| ; p ; }' INPUT > OUTPUT

Or with awk:

awk -F "</?url>" '/<url>/ { print $2 }' INPUT > OUTPUT

Note: both might give you invalid output if more than one <url>...</url> patterns are occurring on a single line. The sed version might fail if the <url>...</url> contains any pipe (|) character.

Upvotes: 0

Related Questions