sed is misbehaving when replacing certain regular expressions

Question

I am trying to remove numbers - but only when they immediately follow periods. Similar replaces seem to work correctly, but not with periods.

I have tried the following which was given as a solution in another post:

echo "fr.r1.1.0" | sed s/\.[0-9][0-9]*/\./g

I get fr..... It seems that even though I escape the period it is matching arbitrary characters instead of only periods.

This expression seems to work for the previous example:

echo "fr.r1.1.0" | sed s/[[:punct:]][0-9][0-9]*/\./g

and gives me fr.r1.. but then for

echo "ge.s1_1.0" | sed s/[[:punct:]][0-9][0-9]*/\./g

I get ge.s1.. instead of ge.s1_1.

Allan · Accepted Answer

You will have to put the sed instructions between single quotes to avoid interpretation of some of the special characters by your shell:

echo "fr.r1.1.0" | sed 's/\.[0-9][0-9]*/\./g'
fr.r1..

Also you do not need to escape the dot in the replacement part (.) and [0-9][0-9]* can be simplified into [0-9]\+ giving the simplified command:

echo "fr.r1.1.0" | sed 's/\.[0-9]\+/./g'
fr.r1..

Last but not least, as POSIX [:punct:] character class is defined as

punctuation (all graphic characters except letters and digits) https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions

it will also include underscore (and a lot of other stuff), therefore, if you want to limit your matches to . followed by digits you will need to explicitly use dot (escaped or via its ascii value)

sed is misbehaving when replacing certain regular expressions

Answers (1)

Related Questions