Laura
Laura

Reputation: 83

Understanding working of sed in bash

#!/bin/bash
echo "the first application of sed"
sed -e 's/^\([0-9]\{3\}\)/(\1)/' s.txt
echo "the second application of sed"
sed -e 's/^\([0-9]\{3\}\)/(\1\+\1)/' s.txt
echo "see the original file"
cat s.txt

the first application of sed
(905)-123-3456
(905)-124-3456
(905)-125-3456
(905)-126-3456
(905)-127-3456
the second application of sed
(905+905)-123-3456
(905+905)-124-3456
(905+905)-125-3456
(905+905)-126-3456
(905+905)-127-3456
see the original file
905-123-3456
905-124-3456
905-125-3456
905-126-3456
905-127-3456

I'm just starting out in shell programming and for the last 2 hours I'm stuck with this code. I know the basic usage of sed but I cannot figure out what the line

sed -e 's/^\([0-9]\{3\}\)/(\1)/' s.txt

does. I know -e is expression, s is substitute. ^ indicates beginning of line but the part after that is confusing. Any ideas?

Upvotes: 0

Views: 45

Answers (2)

zedfoxus
zedfoxus

Reputation: 37069

Let's break this down:

sed -e 's/^\([0-9]\{3\}\)/(\1)/' s.txt

The nomenclature of sed's substitute is like this:

s/search/replace/options

In your case, search part is ^\([0-9]\{3\}\). Parenthesis and curly brackets can have special meaning and they are escaped by a \. If we remove them for understanding purposes, this is how it will look:

^([0-9]{3})

It means - the line should start with a number between 0 and 9 and it should be repeated 3 times. So basically, it's a 3 digit number (e.g. 123, 543 etc.).

The parenthesis () groups the 3 digit number, which can be referred to as the first group.

The replace part of it is (\1). That means, the group we captured in search is regurgitated.

Upvotes: 2

Jonathan Leffler
Jonathan Leffler

Reputation: 753990

Ultimately, it is manual-bashing exercise.

  • \( marks the start of a capture, up to the balanced \) — they can be nested, though these ones don't.
  • \{ marks the start of a repeat specification up to the following \} — they cannot be nested. In this case, you have \{3\} so this repeats the previous item, [0-9], three times.
  • The \1 in the replacement refers the material captured by the first \( in the search pattern.

Hence:

s/^\([0-9]\{3\}\)/(\1)/

wraps the three digits at the start of the line in parentheses — as shown in your output. Because it is anchored, it happens just once. If a line doesn't start with three digits, nothing happens to that line as a result of this command.

The second example is only marginally different. It takes the sequence of three digits at the start of the line and replaces it with that sequence, a + mark, and the sequence again, all wrapped in parentheses — as shown in your output.

There are relatively few metacharacters in the replacement part of a s/// command; there are a lot of metacharacters in the search part. Further, there are different dialects in the search part — some variants of sed support 'extended regular expressions' instead of 'basic regular expressions' (which is what your example uses); others support Perl-like expressions (not quite the full PCRE — Perl Compatible Regular Expressions — as far as I know, but some notations from PCRE). For that, you need to read the manual for the sed you're using.

Upvotes: 3

Related Questions