D.Zou
D.Zou

Reputation: 798

why does one less space in regex makes my sed go weird?

Here is an example of some regex I am trying to figure out. The goal is to strip out extra spaces and make it only one space between words via sed. The sample given has three spaces between sdf and sdk:

test@ubuntu:~/addr_book_script$ echo "est sdf   sdk" | sed 's/  */ /g'
est sdf sdk
test@ubuntu:~/addr_book_script$ echo "est sdf   sdk" | sed 's/ */ /g'
e s t s d f s d k

You will notice that the two sed statement only differs on the number of spaces before the *. The first statement had two spaces and it behaved exactly what I wanted.

The second statement had one space before the * and it stuck a space between each letter and word.

I know the * means any number of occurrences of whatever-it-is-that-I-am-looking-for. What I don't understand is why the one space sed replace behaves the way it does.

Thanks

Upvotes: 4

Views: 66

Answers (2)

Jahid
Jahid

Reputation: 22428

sed 's/ */ /g'

The regex * matches 0 or more occurrences of (space).

  1. At the start of the string a 0 space match is found and replaced by single space
  2. After the first letter another 0 space match is found and replaced by single space and so forth.
  3. After est, more than 0 space is found and replaced by single space

And so forth.

Another example:

~ >>> echo "est sdf   sdk" | sed 's/a*/ /g'
 e s t   s d f       s d k 

The replacements are occurred because of 0 character match.

Upvotes: 2

Ghost
Ghost

Reputation: 2226

" *" (space-star) in regex means 0 or more occurrences of space and so it replaces every instance of 0 or more spaces with a space

" *" (space-space-star) forces there to be at least one space

" +" (space-plus) would accomplish the same thing in some regular expression flavors, but not BRE

Upvotes: 1

Related Questions