Gargauth
Gargauth

Reputation: 2545

Sed removing duplicate characters and certain characters in beginning/end of string

I am asking for your help with sed. I need to remove duplicate underscores and underscores from beginning and end of string.

For example:

echo '[Lorem] ~ ipsum *dolor* sit metus !!!' | sed 's/[^ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789._()-]/_/g'

Produces: _Lorem____ipsum__dolor__sit_metus____

But I need to further format this string to: Lorem_ipsum_dolor_sit_metus

In other words, remove any underscores from beginning and end of string, and reduce multiple consecutive underscore symbols into just one, preferably using another pipes.

Do you have any idea how to do that?

Thank you.

Upvotes: 2

Views: 7398

Answers (2)

Dennis Williamson
Dennis Williamson

Reputation: 360143

All you need to do is add a "+" after your bracket expression to eliminate runs of multiple underscores. Then you can delete the beginning and ending ones. Also, as ladenedge suggested, you can use a character class to shorten your list.

sed 's/[^[:alnum:].()-]\+/_/g;s/^_\(.*\)_$/\1/'

Upvotes: 1

mouviciel
mouviciel

Reputation: 67839

Just add ;s/__*/_/g;s/^_//;s/_$// just after g in your sed command.

Upvotes: 3

Related Questions