Reputation: 55
Example input file:
xxx-xxx(-) xxx xxx xxx - 2e-15 Cytochrome b-c1 complex subunit 9 xxx xxx:241-77(-)
xxx-xxx(+) xxx xxx xxx + 3e-24 Probable endo-beta-1,4-glucanase D xxx xxx:241-77(+)
I've been trying sed, but without success. I can see that the following two things work:
rev file|sed -e 's/-/M/'|rev
rev file|sed -e 's/)/M/'|rev
But, -
and )
together do not work:
rev file|sed -e 's/-)/M/'|rev
Upvotes: 0
Views: 1359
Reputation: 203368
You don't need multiple commands with chains of pipes or fancy operations - since seds regexps are greedy, all you need is:
$ sed 's/\(.*\)-)/\1M/' file
xxx-xxx(-) xxx xxx xxx - 2e-15 Cytochrome b-c1 complex subunit 9 xxx xxx:241-77(M
xxx-xxx(+) xxx xxx xxx + 3e-24 Probable endo-beta-1,4-glucanase D xxx xxx:241-77(+)
Upvotes: 1
Reputation: 52122
We want to replace "something", in this case -)
, with something unique not found elsewhere in your input, say ~B
. To make sure that this sequence isn't in your input, we first replace all ~
with ~A
:
sed 's/~/~A/g' infile
Replace all "something", in this case -)
, with ~B
, of which we now know that it'll be unique:
sed 's/-)/~B/g'
Now your input file looks like this (slightly edited so it fits the line width here):
xxx-xxx(~B xxx - 2e-15 Cytochrome b-c1 complex subunit 9 xxx xxx:241-77(~B
xxx-xxx(+) xxx + 3e-24 Probable endo-beta-1,4-glucanase D xxx xxx:241-77(+)
The next command does this: "as longs as the line has n + 1 of ~B
, replace the first one with -)
. The :a
and ta
are a label to branch to and conditional branching ("go to label :a
if a substitution took place"):
sed ':a;/~B\(.*~B\)\{1\}/s/~B/-)/;ta'
For the case of n = 1, i.e., we want to replace the last occurrences, the quantifier \{1\}
is of course not needed, but can be replaced for other values of n.
The input file now has a unique ~B
where the last -)
used to be:
xxx-xxx(-) xxx - 2e-15 Cytochrome b-c1 complex subunit 9 xxx xxx:241-77(~B
xxx-xxx(+) xxx + 3e-24 Probable endo-beta-1,4-glucanase D xxx xxx:241-77(+)
We replace that single ~B
:
sed 's/~B/M/'
resulting in
xxx-xxx(-) xxx - 2e-15 Cytochrome b-c1 complex subunit 9 xxx xxx:241-77(M
xxx-xxx(+) xxx + 3e-24 Probable endo-beta-1,4-glucanase D xxx xxx:241-77(+)
The rest of the ~B
can now be replaced with what they were, -)
(a no-op in this case):
sed 's/~B/-)/g'
Finally, we undo the first substitution (which has no effect for this example as the input had no ~
to start with):
sed 's/~A/~/g'
All in a single line:
sed 's/~/~A/g;s/-)/~B/g;:a;/~B\(.*~B\)\{1\}/s/~B/-)/;ta;s/~B/M/;s/~B/-)/g;s/~A/~/g' infile
Or, for readability, over multiple lines:
sed '
s/~/~A/g
s/-)/~B/g
:label
/~B\(.*~B\)\{1\}/s/~B/-)/
t label
s/~B/M/
s/~B/-)/g
s/~A/~/g
' infile
Naturally, for the case of n = 1, there are much simpler solutions, like Ed Morton's answer.
Upvotes: 0
Reputation: 2456
It's because rev
"reverses" the order, you know? -)
does not occur in the reversed version; it is )-
in the reversed file:
rev file|sed -e 's/)-/M/'|rev
Upvotes: 1