Reputation: 603
Here is an example block of text I want to format:
<tr><td></td><td>tear a cat in, to make all split.</td><td></td></tr>
<tr><td></td><td class="tdci">The raging rocks</td><td></td></tr>
<tr><td></td><td class="tdci">The foolish Fates.</td></tr>
<tr><td></td><td>This was lofty! Now name the rest of the players.</td><td></td></tr>
using these two 'sed' commands in a script:
sed -ri '/^<tr><td><\/td><td>/N;s/(\n<tr><td><\/td><td class="tdci">)/\n<tr><td>\ <\/td><\/tr>\1/' "$f" #insert table row with empty data fields (blank line) above first line with 'class="tdci"'
sed -ri '/^<tr><td><\/td><td class="tdci">/N;s/(\n<tr><td><\/td><td>)/\n<tr><td>\ <\/td><\/tr>\1/' "$f" #insert table row with empty data fields (blank line) after last line with 'class="tdci"'
here is the result:
<tr><td></td><td>tear a cat in, to make all split.</td><td></td></tr>
<tr><td> </td></tr>
<tr><td></td><td class="tdci">The raging rocks</td><td></td></tr>
<tr><td></td><td class="tdci">The foolish Fates.</td></tr>
<tr><td></td><td>This was lofty! Now name the rest of the players.</td><td></td></tr>
So the first sed
command works by inserting a blank table row above the first line with class="tdci"
, but the almost identical second sed
command meant to insert a blank table row after the last line with class="tdci"
does not work.
I usually save these kinds of edits, editing between multiple lines, for vim since I never have problems with its similar command, but for some reason sed
's" N;s/
has always been hit and miss for me, as in this example, where one instance works fine, yet a second does not. The script removes all leading/trailing whitespace and any Winblowz carriage returns (\r
) before these commands get run.
Since I have a large number of files to edit I would of course prefer to get this working in a script if anyone might be able to see anything obvious I am doing wrong.
Additional details:
Sorry, I forgot to mention that I am running sed
in Linux (Debian stable)
Upvotes: 2
Views: 162
Reputation: 439477
@that other guy's excellent answer shows how to do it with sed
.
However, sed
can be a brain bender when it comes to problems like these that are somewhat procedural in nature, so here's an awk
solution that is probably easier to understand:
awk -v blockRegex='^<tr><td><\/td><td class="tdci">' \
-v lineToInsert='<tr><td>\ <\/td><\/tr>' \
'
# Print a line BEFORE the FIRST line matching `blockRegex`.
$0 ~ blockRegex { if (!afterFirst) {print lineToInsert; afterFirst=inBlock=1} }
# Print a line AFTER the LAST (contiguous) line matching `blockRegex`.
inBlock && $0 !~ blockRegex { print lineToInsert; afterFirst=inBlock=0 }
# Print the input line.
{ print }
' \
file
Note that this could be optimized further, but I wanted to keep it simpler to clarify the logic.
blockRegex
is passed in as a variable (with option -v
) to identify blocks of contiguous lines before and after which a line is to be inserted - with the line to be inserted passed in as variable lineToInsert
.$0 ~ blockRegex
matches each line in a block of lines of interest and prints the line to insert if it's the first line in the block, as indicated by status variable afterFirst
; status variable inBlock
indicates that the line at hand is inside a block of interest.inBlock && $0 !~ blockRegex
matches the first line after the block of interest and prints the line to insert, then resets the status variables. print
simply prints the input line as is.Note that the use of the status variables relies on uninitialized variables in awk
defaulting to 0
(which is treated as false
in a Boolean context; similarly, a non-zero value evaluates as true
).
Upvotes: 2
Reputation: 123570
Start small! Here's a simpler test case for what you're doing:
a1
b1
b2
a2
Here is your code translated for this test case, trying to insert c1
before the first "b" and c2
after the last:
sed -ri '/a/N; s/(\nb)/\nc1\1/' file
sed -ri '/b/N; s/(\na)/\nc2\1/' file
The first command, like you say, appears to work:
a1
c1
b1
b2
a1
The second does not, and just gives you the same result as above rather than inserting c2
.
Here's what you probably thought would happen, with incorrect parts in bold:
a1
is read and printed.c1
is read and printed.b1
is read.
/b/
, and b2
is read with N
. \na
. b1
is printedb2
is read a second time.
/b/
, and a
is read with N
.\na
. c2
is appended.b2\nc2\na
is printed.Here is what actually happens,
a1
is read and printed.c1
is read and printed.b1
is read.
/b/
, and b2
is read with N
. \na
. b1\nb2
is printeda2
is read and printed, because b2
has already been read above. Here's a working command:
sed -ri '/b/ { :b; N; s/\na/\nc2&/; te; P; D; bb; }; :e;' file
In pseudocode -- with roughly corresponding sed part in comments -- this is:
if (input.matches("b")) { // /b/ {
while(true) { // :b
input += "\n" + readline(); // N
if(input.matches("\na")) { // s/\na/ ..
input = input.replace("(\na)", "\nc2\1"); // .. \nc2&/
goto exit; // te
}
print(input.substring(0, input.indexOf('\n')); // P
input = input.substring(input.indexOf('\n') + 1); // D
} // bb
} // }
:exit // :e
Translated back to your data:
sed -ri '/^<tr><td><\/td><td class="tdci">/ { :b; N; s/(\n<tr><td><\/td><td>)/\n<tr><td>\ <\/td><\/tr>\1/; te; P; D; bb; }; :e' "$f"
Upvotes: 5