flohei
flohei

Reputation: 5308

Replace specific capture group instead of entire regex in Perl

I've got a regular expression with capture groups that matches what I want in a broader context. I then take capture group $1 and use it for my needs. That's easy.

But how to use capture groups with s/// when I just want to replace the content of $1, not the entire regex, with my replacement?

For instance, if I do:

$str =~ s/prefix (something) suffix/42/

prefix and suffix are removed. Instead, I would like something to be replaced by 42, while keeping prefix and suffix intact.

Upvotes: 16

Views: 21173

Answers (5)

Dada
Dada

Reputation: 6626

Use lookaround assertions. Quoting the documentation:

Lookaround assertions are zero-width patterns which match a specific pattern without including it in $&. Positive assertions match when their subpattern matches, negative assertions match when their subpattern fails. Lookbehind matches text up to the current match position, lookahead matches text following the current match position.

If the beginning of the string has a fixed length, you can thus do:

s/(?<=prefix)(your capture)(?=suffix)/$1/

However, ?<= does not work for variable length patterns (starting from Perl 5.30, it accepts variable length patterns whose length is smaller than 255 characters, which enables the use of |, but still prevents the use of *). The work-around is to use \K instead of (?<=):

s/.*prefix\K(your capture)(?=suffix)/$1/

Upvotes: 1

Lewis R
Lewis R

Reputation: 478

I use something like this:

s/(?<=prefix)(group)(?=suffix)/$1 =~ s|text|rep|gr/e;

Example:

In the following text I want to normalize the whitespace but only after ::=:

some    text     := a   b        c d   e   ;

Which can be achieved with:

s/(?<=::=)(.*)/$1 =~ s|\s+| |gr/e

Results with:

some    text     := a b c d e ;

Explanation:

(?<=::=): Look-behind assertion to match ::=

(.*): Everything after ::=

$1 =~ s|\s+| |gr: With the captured group normalize whitespace. Note the r modifier which makes sure not to attempt to modify $1 which is read-only. Use a different sub delimiter (|) to not terminate the replacement expression.

/e: Treat the replacement text as a perl expression.

Upvotes: 1

user507077
user507077

Reputation:

If you only need to replace one capture then using @LAST_MATCH_START and @LAST_MATCH_END (with use English; see perldoc perlvar) together with substr might be a viable choice:

use English qw(-no_match_vars);
$your_string =~ m/aaa (bbb) ccc/;
substr $your_string, $LAST_MATCH_START[1], $LAST_MATCH_END[1] - $LAST_MATCH_START[1], "new content";
# replaces "bbb" with "new content"

Upvotes: 3

Jabda
Jabda

Reputation: 1792

This is an old question but I found the below easier for replacing lines that start with >something to >something_else. Good for changing the headers for fasta sequences

  while ($filelines=~ />(.*)\s/g){
        unless ($1 =~ /else/i){
                $filelines =~ s/($1)/$1\_else/;
        }

  }

Upvotes: 2

Birei
Birei

Reputation: 36272

As I understand, you can use look-ahead or look-behind that don't consume characters. Or save data in groups and only remove what you are looking for. Examples:

With look-ahead:

s/your_text(?=ahead_text)//;

Grouping data:

s/(your_text)(ahead_text)/$2/;

Upvotes: 20

Related Questions