Meeta Sunil
Meeta Sunil

Reputation: 55

How can I substitute just the last occurrence of -) in a string using sed, bash or awk?

Example input file:

xxx-xxx(-)        xxx   xxx  xxx      -       2e-15   Cytochrome b-c1 complex subunit 9       xxx   xxx:241-77(-)
xxx-xxx(+)        xxx   xxx  xxx      +       3e-24   Probable endo-beta-1,4-glucanase D       xxx   xxx:241-77(+)

I've been trying sed, but without success. I can see that the following two things work:

rev file|sed -e 's/-/M/'|rev
rev file|sed -e 's/)/M/'|rev

But, - and ) together do not work:

rev file|sed -e 's/-)/M/'|rev

Upvotes: 0

Views: 1359

Answers (3)

Ed Morton
Ed Morton

Reputation: 203368

You don't need multiple commands with chains of pipes or fancy operations - since seds regexps are greedy, all you need is:

$ sed 's/\(.*\)-)/\1M/' file
xxx-xxx(-)        xxx   xxx  xxx      -       2e-15   Cytochrome b-c1 complex subunit 9       xxx   xxx:241-77(M
xxx-xxx(+)        xxx   xxx  xxx      +       3e-24   Probable endo-beta-1,4-glucanase D       xxx   xxx:241-77(+)

Upvotes: 1

Benjamin W.
Benjamin W.

Reputation: 52122

A general approach for "replace the nth-to-last one of something" with pure (GNU) sed

  1. We want to replace "something", in this case -), with something unique not found elsewhere in your input, say ~B. To make sure that this sequence isn't in your input, we first replace all ~ with ~A:

    sed 's/~/~A/g' infile
    
  2. Replace all "something", in this case -), with ~B, of which we now know that it'll be unique:

    sed 's/-)/~B/g'
    

    Now your input file looks like this (slightly edited so it fits the line width here):

    xxx-xxx(~B  xxx   -   2e-15   Cytochrome b-c1 complex subunit 9   xxx   xxx:241-77(~B
    xxx-xxx(+)  xxx   +   3e-24   Probable endo-beta-1,4-glucanase D   xxx   xxx:241-77(+)
    
  3. The next command does this: "as longs as the line has n + 1 of ~B, replace the first one with -). The :a and ta are a label to branch to and conditional branching ("go to label :a if a substitution took place"):

    sed ':a;/~B\(.*~B\)\{1\}/s/~B/-)/;ta'
    

    For the case of n = 1, i.e., we want to replace the last occurrences, the quantifier \{1\} is of course not needed, but can be replaced for other values of n.

    The input file now has a unique ~B where the last -) used to be:

    xxx-xxx(-)  xxx   -   2e-15   Cytochrome b-c1 complex subunit 9   xxx   xxx:241-77(~B
    xxx-xxx(+)  xxx   +   3e-24   Probable endo-beta-1,4-glucanase D   xxx   xxx:241-77(+)
    
  4. We replace that single ~B:

    sed 's/~B/M/'
    

    resulting in

    xxx-xxx(-)  xxx   -   2e-15   Cytochrome b-c1 complex subunit 9   xxx   xxx:241-77(M
    xxx-xxx(+)  xxx   +   3e-24   Probable endo-beta-1,4-glucanase D   xxx   xxx:241-77(+)
    
  5. The rest of the ~B can now be replaced with what they were, -) (a no-op in this case):

    sed 's/~B/-)/g'
    
  6. Finally, we undo the first substitution (which has no effect for this example as the input had no ~ to start with):

    sed 's/~A/~/g'
    

All in a single line:

sed 's/~/~A/g;s/-)/~B/g;:a;/~B\(.*~B\)\{1\}/s/~B/-)/;ta;s/~B/M/;s/~B/-)/g;s/~A/~/g' infile

Or, for readability, over multiple lines:

sed '
s/~/~A/g
s/-)/~B/g
:label
/~B\(.*~B\)\{1\}/s/~B/-)/
t label
s/~B/M/
s/~B/-)/g
s/~A/~/g
' infile

Naturally, for the case of n = 1, there are much simpler solutions, like Ed Morton's answer.

Upvotes: 0

Jeff Y
Jeff Y

Reputation: 2456

It's because rev "reverses" the order, you know? -) does not occur in the reversed version; it is )- in the reversed file:

rev file|sed -e 's/)-/M/'|rev

Upvotes: 1

Related Questions