Reputation: 141
I have lines of the form
XXXXXXXXXXXXXXXwordYYYYYYYYYYYYYYYYYYYYYYYYY<R>ZZZZZ
XXXXXXXXXXXXXXX[[YYYYYYYYYYYYYYYYYYYYYYYYYYYYY<R>ZZZZZ
I don't want to get into the syntax issues, but what I want to do with any line that contains <R>
is replace it with the following text
XXXXXXXXXXXXXXX{wordYYYYYYYYYYYYYYYYYYYYYYYYYZZZZZ}
XXXXXXXXXXXXXXX{[[YYYYYYYYYYYYYYYYYYYYYYYYYYYYYZZZZZ}
Getting rid of the <R>
is trivial:
str = $0
sub(/<R>/, "", str)
print str
Assume that the string is created by a program that I have no control over, and the transformed representation is processed by yet another program, and I have somehow (by magic) transform the output of program A into suitable syntax for program B, e.g.
A ... | awk ... | B ...
Somewhere between the sub and the print, I want to surround the data with {} as indicated. The sequence of XXX...XXX, YYY...YY and ZZ...ZZ are arbitrary character sequences of arbitrary length, so I want to split the string at the word "word" or at the first [, and retain those characters in the result string. Nothing I have found seems to quite answer this question. The closing } always goes at the end of the line, so that's equally trivial to deal with.
Note: This is a simplified description of a far more complicated syntax, but describing the details of the syntax would not be productive.
Upvotes: 0
Views: 243
Reputation: 5975
If all you want is to surround the last part of the line (starting with word
or [
) with {}
, you could use the GNU awk
string function gensub()
.
gensub() provides an additional feature that is not available in sub() or gsub(): the ability to specify components of a regexp in the replacement text.
awk '{ print gensub(/([word|\[].+)$/, "{&}", "g", $0) }' file
Putting it together with your existing code for deleting <R>
:
awk '{
str = $0
sub(/<R>/, "", str)
print gensub(/([word|\[].+)$/, "{&}", "g", str)
}' file
output:
XXXXXXXXXXXXXXX{wordYYYYYYYYYYYYYYYYYYYYYYYYYZZZZZ}
XXXXXXXXXXXXXXX{[[YYYYYYYYYYYYYYYYYYYYYYYYYYYYYZZZZZ}
Note: I have assumed that your sample input is two lines, so regex matches until end of line ($
). If it is one line, you just have to modify the end of the regex.
Upvotes: 0
Reputation: 204015
With a sed that has a -E
arg to support EREs, e.g. GNU or OSX/BSD sed:
$ sed -E 's/((word|\[\[).*)<R>(.*)/{\1\3}/' file
XXXXXXXXXXXXXXX{wordYYYYYYYYYYYYYYYYYYYYYYYYYZZZZZ}
XXXXXXXXXXXXXXX{[[YYYYYYYYYYYYYYYYYYYYYYYYYYYYYZZZZZ}
With a POSIX sed:
$ sed 's/\(\(word\|\[\[\).*\)<R>\(.*\)/{\1\3}/' file
XXXXXXXXXXXXXXX{wordYYYYYYYYYYYYYYYYYYYYYYYYYZZZZZ}
XXXXXXXXXXXXXXX{[[YYYYYYYYYYYYYYYYYYYYYYYYYYYYYZZZZZ}
Upvotes: 1
Reputation: 58483
This might work for you (GNU sed):
sed -E 's/^(.*)(word.*)<R>(.*) \1(\[.*)<R>\3$/\1{\2\3}\n\1{\4\3}/' file
Pattern match on a line and substitute using back references and groupings if a match is successful.
N.B. The back references \1
and \3
are used in the LHS of the regexp.
The use of Y
's in the question are inconsistent i.e. different length.
Upvotes: 0
Reputation: 785601
You may use this awk
with alternation regex:
awk '{sub(/word|\[\[/, "{&"); sub(/<R>/, ""); sub(/$/, "}")} 1' file
XXXXXXXXXXXXXXX{wordYYYYYYYYYYYYYYYYYYYYYYYYYZZZZZ}
XXXXXXXXXXXXXXX{[[YYYYYYYYYYYYYYYYYYYYYYYYYYYYYZZZZZ}
This sed
should also work for you:
sed -E 's/word|\[\[/{&/; s/<R>//; s/$/}/' file
Upvotes: 0