Mansueli
Mansueli

Reputation: 7014

Regex Space character in Sed

I've tried almost everything (I guess) but nothing worked. (Operating System: Ubuntu 12.04)

Expressions to be matched (removed from text files):

a c 4
a k 23
o s 1

What I tried:

's/[[a-z][:space:][a-z][:space:][0-9]]\{1,\}//gi'
's/.\s.\s[0-9]+//g'
's/[:alpha:][:space:][:alpha:][:space:][:digit:]+'

Upvotes: 20

Views: 53935

Answers (3)

perreal
perreal

Reputation: 98118

This should match:

sed 's/[a-z][ ]*[a-z][ ]*[0-9]*//gi'

Your 1st try misses a couple of square brackets, and you don't need the outermost one:

sed 's/[a-z][[:space:]][a-z][[:space:]][0-9]\{1,\}//gi' input

Your 2nd example fails because you need to escape the +, and still it will only work in gnu sed:

sed 's/.\s.\s[0-9]\+//g' input

Also some similar problems with the last one:

sed 's/[[:alpha:]][[:space:]][[:alpha:]][[:space:]][[:digit:]]\+//' input

Upvotes: 27

choroba
choroba

Reputation: 242343

[...] defines a character class. [a-z] matches any character from a to z. To match consecutive characters, you have to use a class for each: [a-z][[:space:]][a-z].

For + to have the special meaning, you have to backslash it: [0-9]\+.

Named character classes only work inside character classes, i.e. [[:alpha:]][[:space:]].

Upvotes: 1

svckr
svckr

Reputation: 839

The one in the middle is close! You have to escape the plus sign for a reason that is beyond me. I also replaced the dot "." with "[a-z]" so it only matches letters.

sed 's/[a-z]\s[a-z]\s[0-9]\+//g'

Bonus portable version for older sed-Versions (Hello, Mac users!). Some sed implementations are picky on the "+" :

sed 's/[a-z]\s[a-z]\s[0-9][0-9]*//g'

Upvotes: 2

Related Questions