barlop
barlop

Reputation: 13780

how do you specify non-capturing groups in sed?

is it possible to specify non-capturing groups in sed?

if so, how?

Parentheses in sed have two functions, grouping, and capturing.

So i'm asking about using parentheses to do the grouping, but without capturing. One might say non-capturing grouping parentheses. (non-capturing parantheses and that aren't literal). What are called non-capturing groups. Like i've seen the syntax (?:regex) for non-capturing groups, but it doesn't work in sed.

Linguistic Note- in the UK, the term brackets is used generally, for "round brackets" or "square brackets". In the UK, brackets usually refers to "( )", since "( )" are so common. And in the UK the term parentheses is hardly used. In the USA the term brackets are specifically "[ ]". So to prevent confusion to anybody in the USA, i've not used the words brackets in the question.

Upvotes: 71

Views: 39735

Answers (4)

Hector Llorens
Hector Llorens

Reputation: 135

As said, it is not possible to have non-capturing groups in sed.

It could be obvious but non-capturing groups are not a necessity(unless running into the back reference limit (e.g. \9).).

One can just use the desired capturing ones and ignore the non-desired ones as if they were non-capturing.

So e.g. of the two capturings here \1 and \2 you can ignore the \1 and just use the \2

$ echo blahblahblahc | sed -r "s/(blah){1,10}(.)/\2/"
c

For reference, nested capturing groups are numbered by the position-order of "(".

E.g.,

echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/\1x/g"

applex and bananasx and monkeys (note: "s" in bananas, first bigger group)

vs

echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/\2x/g"

applex and bananax and monkeys (note: no "s" in bananas, second smaller group)

Upvotes: 3

barlop
barlop

Reputation: 13780

The answer, is that as of writing, you can't - sed does not support it.

Non-capturing groups have the syntax of (?:a) and are a PCRE syntax.

Sed supports BRE(Basic regular expressions), aka POSIX BRE, and if using GNU sed, there is the option -r that makes it support ERE(extended regular expressions) aka POSIX ERE, but still not PCRE)

Perl will work, for windows or linux

examples here

https://superuser.com/questions/416419/perl-for-matching-with-regular-expressions-in-terminal

e.g. this from cygwin in windows

$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\1/s'
a

$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\2/s'
c

There is a program albeit for Windows, which can do search and replace on the command line, and does support PCRE. It's called rxrepl. It's not sed of course, but it does search and replace with PCRE support.

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\1"
a

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\3"
c

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(?:c)" -r "\3"
Invalid match group requested.

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(?:b)(c)" -r "\2"
c

C:\blah\rxrepl>

The author(not me), mentioned his program in an answer over here https://superuser.com/questions/339118/regex-replace-from-command-line

It has a really good syntax.

The standard thing to use would be perl, or almost any other programming language that people use.

Upvotes: 37

Dennis Williamson
Dennis Williamson

Reputation: 360015

Parentheses can be used for grouping alternatives. For example:

sed 's/a\(bc\|de\)f/X/'

says to replace "abcf" or "adef" with "X", but the parentheses also capture. There is not a facility in sed to do such grouping without also capturing. If you have a complex regex that does both alternative grouping and capturing, you will simply have to be careful in selecting the correct capture group in your replacement.

Perhaps you could say more about what it is you're trying to accomplish (what your need for non-capturing groups is) and why you want to avoid capture groups.

Edit:

There is a type of non-capturing brackets ((?:pattern)) that are part of Perl-Compatible Regular Expressions (PCRE). They are not supported in sed (but are when using grep -P).

Upvotes: 36

SiegeX
SiegeX

Reputation: 140307

I'll assume you are speaking of the backrefence syntax, which are parentheses ( ) not brackets [ ]

By default, sed will interpret ( ) literally and not attempt to make a backrefence from them. You will need to escape them to make them special as in \( \) It is only when you use the GNU sed -r option will the escaping be reversed. With sed -r, non escaped ( ) will produce backrefences and escaped \( \) will be treated as literal. Examples to follow:

POSIX sed

$ echo "foo(###)bar" | sed 's/foo(.*)bar/@@@@/'
@@@@

$ echo "foo(###)bar" | sed 's/foo(.*)bar/\1/'
sed: -e expression #1, char 16: invalid reference \1 on `s' command's RHS
-bash: echo: write error: Broken pipe

$ echo "foo(###)bar" | sed 's/foo\(.*\)bar/\1/'
(###)

GNU sed -r

$ echo "foo(###)bar" | sed -r 's/foo(.*)bar/@@@@/'
@@@@

$ echo "foo(###)bar" | sed -r 's/foo(.*)bar/\1/'
(###)

$ echo "foo(###)bar" | sed -r 's/foo\(.*\)bar/\1/'
sed: -e expression #1, char 18: invalid reference \1 on `s' command's RHS
-bash: echo: write error: Broken pipe

Update

From the comments:

Group-only, non-capturing parentheses ( ) so you can use something like intervals {n,m} without creating a backreference \1 don't exist. First, intervals are not apart of POSIX sed, you must use the GNU -r extension to enable them. As soon as you enable -r any grouping parentheses will also be capturing for backreference use. Examples:

$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###/'
###789

$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###\1/'
###456.789

Upvotes: 7

Related Questions