JohnnyFromBF
JohnnyFromBF

Reputation: 10191

How to search and replace this string with sed?

I'm desperately trying to search the following:

<texit info> author=MySelf title=MyTitle </texit>

and replace it with blank.

What I've tried so far is the following:

sed –I '1,5s/<texit//;s/info>//;s/author=MySelf//;s/title=MyTitle//' test.txt

But it doesn't work.

Upvotes: 0

Views: 320

Answers (2)

Charles Duffy
Charles Duffy

Reputation: 295989

Don't edit XML with sed -- the right tool would be something like XMLStarlet, with a line like the following:

xmlstarlet ed -u //texit[@info] -v 'author=NewAuthor title=NewTitle'

...if your goal were to update the text within the tag.

Regular expressions are not expressive enough to correctly handle XML (even formally -- regular expressions are theoretically sufficient to parse regular languages; XML is not one). For instance, your original would be just as valid written with newlines, as:

< texit
  info >author=MySelf title=MyTitle</texit>

...and writing a sed command to handle that case would not be fun. XML-native tools, on the other hand, can correctly handle all of XML's corner cases.

That said, the sed expression you gave does indeed "work", inasmuch as it does exactly what it's written to do.

sed -e '1,5s/<texit//;s/info>//;s/author=MySelf//;s/title=MyTitle//' \
  <<<"<texit info>author=MySelf title=MyTitle foo bar</texit>"

returns the output

   foo bar</texit>

which is exactly what it should do, as it's removing the <texit string, the info> string, the author=MySelf, title=MyTitle, but leaving the closing </texit> and any excess text, just as you asked. If you expect or desire it to do something different, you should explain what that is.

Upvotes: 2

beerbajay
beerbajay

Reputation: 20300

sed 's/<texit\s\+info>\s*author=MySelf\s\+title=MyTitle\s*<\/texit>//g' test.txt

You should generally not edit XML with a regex, but if you only want to strip these tags, the above will work. You don't need multiple s commands, just use a single pattern with correctly defined whitespace.

Upvotes: 2

Related Questions