David17
David17

Reputation: 183

How to use back-reference of sed replacement command correctly considering a special Regular Expression

I am learning the sed s/regexp/replacement/ command on linux.

There are some numbers from phone.txt

(555)555-1212
(555)555-1213
(555)555-1214
(666)555-1215
(777)555-1217

I'd like to use the regular expression (which I have tested on https://www.freeformatter.com/regex-tester.html)

 (\(555\))(.*-)(.*$)

to match numbers which begin with (555). And then I want the output of these three parts of these matched number as: (an example for number (555)555-1212)

Area code: (555) Second: 555- Third: 1212

I tried the following command:

cat phone.txt | sed 's/\(\\\(555\\\)\)\(.*-\)\(.*$)/Area code: \1 Second: \2 Third: \3/'

But the system gave me:

sed: -e expression #1, char 66: Unmatched ( or \(

The general command for all numbers was:

cat phone.txt | sed 's/\(.*)\)\(.*-\)\(.*$\)/Area code: \1 Second: \2 Third: \3/'

Source: https://www.tutorialspoint.com/unix/unix-regular-expressions.htm

But I just want to execute sed on numbers which begins with (555) and add it to the output through back reference.

Could you tell me how to write this special command correctly?

Upvotes: 10

Views: 18752

Answers (2)

David C. Rankin
David C. Rankin

Reputation: 84579

You can generalize using the formatting included in the string to pick out the first 555, the second 555 and the third 1212 without limiting yourself to any specific prefix within the s/find/replace/ substitution form of sed. You can then limit as needed by including a matching condition before the substitution where you would enter your 555 or 666, etc...

To include the pattern match along with the substitution, you use the following form:

sed '/pattern/s/find/replace/'

To make the pattern match suppress output for all lines except those that match the pattern you pass the -n option to suppress printing of pattern space, and include a p at the end of the substitute form to explicitly print those lines that match, e.g.

sed -n '/pattern/s/find/replace/p'

Now, let's turn to your problem at hand. To limit your reformatted output to only those lines beginning with (555) you would do:

$ sed -n '/^(555)/s/^(\([^)]*\))\([^-]*\)-\(.*\)$/Area code: (\1) Second: \2- Third: \3/p' file
Area code: (555) Second: 555- Third: 1212
Area code: (555) Second: 555- Third: 1213
Area code: (555) Second: 555- Third: 1214

(note: the backreferences capture only the numbers and not the (..) or '-')

To reformat all lines, you would remove the -n and /pattern/ along with the final p, using only the base sed 's/find/replace/ form, e.g.

$ sed 's/^(\([^)]*\))\([^-]*\)-\(.*\)$/Area code: (\1) Second: \2- Third: \3/' file
Area code: (555) Second: 555- Third: 1212
Area code: (555) Second: 555- Third: 1213
Area code: (555) Second: 555- Third: 1214
Area code: (666) Second: 555- Third: 1215
Area code: (777) Second: 555- Third: 1217

Look things over and let me know if you have further questions.

Upvotes: 5

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627083

Ypu are using POSIX BRE syntax in your sed command, and in such patterns, unescaped parentheses match literal parentheses. Escaped parentheses there define capturing groups.

You may use

sed -E 's/(\(555\))(.*-)(.*)/Area code: \1 Second: \2 Third: \3/'

See the online demo

Literal parentheses in POSIX ERE syntax (enabled with -E option) are escaped as in all common online regex testers, and unescaped parentheses define capturing groups.

Upvotes: 20

Related Questions