blueFast
blueFast

Reputation: 44381

Excluding a string in a regex matching, for sed processing

I need to match this for a substitute command:

whatever__MATCH_THIS__whateverwhatever__AND_THIS__whateverwhatever

I am trying with:

sed -e 's/__\(.*\)__/\{{\1}}/g' myfile

But this is eagerly matching __MATCH_THIS__whateverwhatever__AND_THIS__, producing:

whatever{{MATCH_THIS__whateverwhatever__AND_THIS}}whateverwhatever

But I wanted:

whatever{{MATCH_THIS}}whateverwhatever{{AND_THIS}}whateverwhatever

How can I specify a string to exclude, in the matching part? I know how to exclude one character (for example [^a]) but not how to exclude a string.

Upvotes: 4

Views: 3471

Answers (7)

Clement
Clement

Reputation: 79

I can be wrong but I guess it is as simple as that:

sed -r 's/__(.*)__(.*)__(.*)__/\{{\1}}\2{{\3}}/g'

Test it as follows: (works for me)

echo "whatever__MATCH_THIS__whateverwhatever__AND_THIS__whateverwhatever"|sed -r 's/__(.*)__(.*)__(.*)__/\{{\1}}\2{{\3}}/g'

results in:

whatever{{MATCH_THIS}}whateverwhatever{{AND_THIS}}whateverwhatever

Upvotes: 0

Birei
Birei

Reputation: 36272

One way using sed although is clear that is not the best tool to do the job. I've commented the code to see what happens because it seems a little confused:

sed -n '
    ## Insert a newline just before each "__". This is the most
    ## important instruction of all the script. The game is that
    ## newline character is the only want that sed cannot find in
    ## a line of data, so use it to know where there will be "__"
    ## to change. For each part changed the script will save it
    ## in hold space, but due to constraints of those (only two
    ## spaces) I will have to play deleting and recovering data
    ## several times between both.
    s/__/\n&/g

    ## Save in hold space all data until first newline.
    ## So it means, just before the first "__" of the line.
    h ; s/\n.*$// ; x

    ## Remove that part just saved in hold space.
    s/^[^\n]*\n//

    ## Set a label to jump it later.
    :a

    ## This is end condition. When not found any newline
    ## in the pattern space means that there are no more "__" to 
    ## process, so get all data saved in hold space, print
    ## it and leave hold space empty ready for next line of 
    ## the input file.
    /^[^\n]\+$/ {
        g
        p
        x
        s/^.*$//
        x
        b
    }

    ## This part of code will process next two input lines.
    ## First one has the first pair of "__" and second one has
    ## the end pair, so substitute to each respective curly
    ## braces.
    s/__/{{/

    ## Once the substitution has been done, save it adding to
    ## hold space.
    ## I add all the line but only want to keep until first newline.
    ## I delete two of them because "H" command adds it one by itself.
    H ; x ; s/\n// ; s/\n.*$// ; x

    ## Delete part just processed and saved in hold space.
    s/^[^\n]*\n//

    ## Repeat same process for end pair of "__"
    s/__/}}/
    H ; x ; s/\n// ; s/\n.*$// ; x
    s/^[^\n]*\n//

    ## Goto label "a"
    ba 
' infile

Paste and run it from command line, with your two lines provided it yields:

whatever{{MATCH_THIS}}whateverwhatever{{AND_THIS}}whateverwhatever
exten => s,n,ExecIf($[${amacode} == 1]?Set(rateparams_view={{INCOMING_RATEPARAMS_VIEW}}):Set(rateparams_view={{OUTGOING_RATEPARAMS_VIEW}}))

Upvotes: 3

chooban
chooban

Reputation: 9256

What you need is a non-greedy regex, but unfortunately sed doesn't allow that. However, it can be done in perl.

perl -pe 's|__(.*?)__|{{$1}}|g' <myfile

The question mark after the asterisk denotes the matcher as being non-greedy, so instead of taking the longest matching string it can find, it'll take the shortest.

Hope that helps.

If you wanted to put this in a perl script rather than run on the command line, then something like this will do the job:

#! /usr/bin/perl -w
use strict; # Habit of mine
use 5.0100; # So we can use 'say'

# Save the matching expression in a variable. 
# qr// tells us it's a regex-like quote (http://perldoc.perl.org/functions/qr.html)
my $regex = qr/__(.*?)__/;

# Ordinarily, I'd write this in a way I consider to be less perl-y and more readable.
# What it's doing is reading from the filename supplied on STDIN and places the
# contents of the file in $_. Then it runs the substitution on the string, before
# printing out the result.
while (<>) {
  $_ =~ s/$regex/{{$1}}/g;
  say $_;
}

Usage is simple:

./regex myfile
whatever{{MATCH_THIS}}whateverwhatever{{AND_THIS}}whateverwhatever

It's Perl, there are a million and one ways to do it!

Upvotes: 2

potong
potong

Reputation: 58430

This might work for you (GNU sed):

sed -r 's/__([^_]+(_[^_]+)*)__/{{\1}}/g' file

or, perhaps easier to understand:

sed -r 's/__/\n/g;s/\n([^\n]*)\n/{{\1}}/g;s/\n/__/g' file

Upvotes: 1

Endoro
Endoro

Reputation: 37569

GNU sed

sed ':k s/__/{{/;s/__/}}/;tk' file

input:

whatever__MATCH_THIS__whateverwhatever__AND_THIS__whateverwhatever
blah__XXX_XX__blah_blah_blah__XX_XXX__whateverwhatever

output:

whatever{{MATCH_THIS}}whateverwhatever{{AND_THIS}}whateverwhatever
blah{{XXX_XX}}blah_blah_blah{{XX_XXX}}whateverwhatever

Upvotes: 3

Marichyasana
Marichyasana

Reputation: 3154

This works on my windows XP laptop

input command
echo whatever__MATCH_THIS__whateverwhatever__AND_THIS__whateverwhatever|sed -f a.sed
output
whatever{{__MATCH_THIS__}}whateverwhatever{{__AND_THIS__}}whateverwhatever
where a.sed is this

    /__MATCH_THIS__/{
    /__AND_THIS__/{
    s/__MATCH_THIS__/\{\{__MATCH_THIS__\}\}/
    s/__AND_THIS__/\{\{__AND_THIS__\}\}/
    }
    }

Upvotes: 0

iruvar
iruvar

Reputation: 23364

sed does not support PCRE goodies such as the non-greedy operator

I was able to get around your situation with the following variation:

echo 'whatever__MATCH_THIS__whateverwhatever__AND_THIS__whateverwhatever' |
sed -e 's/__\([^_]\+_[^_]\+\)__/\{{\1}}/g'
whatever{{MATCH_THIS}}whateverwhatever{{AND_THIS}}whateverwhatever

Upvotes: 1

Related Questions