nanker
nanker

Reputation: 603

sed's 'N' command working intermittently

Here is an example block of text I want to format:

<tr><td></td><td>tear a cat in, to make all split.</td><td></td></tr>
<tr><td></td><td class="tdci">The raging rocks</td><td></td></tr>
<tr><td></td><td class="tdci">The foolish Fates.</td></tr>
<tr><td></td><td>This was lofty! Now name the rest of the players.</td><td></td></tr>

using these two 'sed' commands in a script:

sed -ri '/^<tr><td><\/td><td>/N;s/(\n<tr><td><\/td><td class="tdci">)/\n<tr><td>\&nbsp;<\/td><\/tr>\1/' "$f"   #insert table row with empty data fields (blank line) above first line with 'class="tdci"'
sed -ri '/^<tr><td><\/td><td class="tdci">/N;s/(\n<tr><td><\/td><td>)/\n<tr><td>\&nbsp;<\/td><\/tr>\1/' "$f"   #insert table row with empty data fields (blank line) after last line with 'class="tdci"'

here is the result:

<tr><td></td><td>tear a cat in, to make all split.</td><td></td></tr>
<tr><td>&nbsp;</td></tr>
<tr><td></td><td class="tdci">The raging rocks</td><td></td></tr>
<tr><td></td><td class="tdci">The foolish Fates.</td></tr>
<tr><td></td><td>This was lofty! Now name the rest of the players.</td><td></td></tr>

So the first sed command works by inserting a blank table row above the first line with class="tdci", but the almost identical second sed command meant to insert a blank table row after the last line with class="tdci" does not work.

I usually save these kinds of edits, editing between multiple lines, for vim since I never have problems with its similar command, but for some reason sed's" N;s/ has always been hit and miss for me, as in this example, where one instance works fine, yet a second does not. The script removes all leading/trailing whitespace and any Winblowz carriage returns (\r) before these commands get run.

Since I have a large number of files to edit I would of course prefer to get this working in a script if anyone might be able to see anything obvious I am doing wrong.

Additional details:

Sorry, I forgot to mention that I am running sed in Linux (Debian stable)

Upvotes: 2

Views: 162

Answers (2)

mklement0
mklement0

Reputation: 439477

@that other guy's excellent answer shows how to do it with sed.

However, sed can be a brain bender when it comes to problems like these that are somewhat procedural in nature, so here's an awk solution that is probably easier to understand:

awk -v blockRegex='^<tr><td><\/td><td class="tdci">' \
    -v lineToInsert='<tr><td>\&nbsp;<\/td><\/tr>' \
  '
    # Print a line BEFORE the FIRST line matching `blockRegex`.
  $0 ~ blockRegex { if (!afterFirst) {print lineToInsert; afterFirst=inBlock=1} }
    # Print a line AFTER the LAST (contiguous) line matching `blockRegex`.
  inBlock && $0 !~ blockRegex { print lineToInsert; afterFirst=inBlock=0 }
    # Print the input line.
  { print }
  ' \
  file

Note that this could be optimized further, but I wanted to keep it simpler to clarify the logic.

  • blockRegex is passed in as a variable (with option -v) to identify blocks of contiguous lines before and after which a line is to be inserted - with the line to be inserted passed in as variable lineToInsert.
  • $0 ~ blockRegex matches each line in a block of lines of interest and prints the line to insert if it's the first line in the block, as indicated by status variable afterFirst; status variable inBlock indicates that the line at hand is inside a block of interest.
  • inBlock && $0 !~ blockRegex matches the first line after the block of interest and prints the line to insert, then resets the status variables.
  • print simply prints the input line as is.

Note that the use of the status variables relies on uninitialized variables in awk defaulting to 0 (which is treated as false in a Boolean context; similarly, a non-zero value evaluates as true).

Upvotes: 2

that other guy
that other guy

Reputation: 123570

Start small! Here's a simpler test case for what you're doing:

a1
b1
b2
a2

Here is your code translated for this test case, trying to insert c1 before the first "b" and c2 after the last:

sed -ri '/a/N; s/(\nb)/\nc1\1/' file
sed -ri '/b/N; s/(\na)/\nc2\1/' file

The first command, like you say, appears to work:

a1
c1
b1
b2
a1

The second does not, and just gives you the same result as above rather than inserting c2.

Here's what you probably thought would happen, with incorrect parts in bold:

  1. a1 is read and printed.
  2. c1 is read and printed.
  3. b1 is read.
    • It matches /b/, and b2 is read with N.
    • It doesn't match \na.
    • b1 is printed
  4. b2 is read a second time.
    • It matches /b/, and a is read with N.
    • It matches \na. c2 is appended.
    • b2\nc2\na is printed.

Here is what actually happens,

  1. a1 is read and printed.
  2. c1 is read and printed.
  3. b1 is read.
    • It matches /b/, and b2 is read with N.
    • It doesn't match \na.
    • b1\nb2 is printed
  4. a2 is read and printed, because b2 has already been read above.

Here's a working command:

sed -ri '/b/ { :b; N; s/\na/\nc2&/; te; P; D; bb; }; :e;' file

In pseudocode -- with roughly corresponding sed part in comments -- this is:

if (input.matches("b")) {                               // /b/ {
  while(true) {                                         // :b
    input += "\n" + readline();                         // N
    if(input.matches("\na")) {                          // s/\na/ ..
      input = input.replace("(\na)", "\nc2\1");         // .. \nc2&/
      goto exit;                                        // te
    }
    print(input.substring(0, input.indexOf('\n'));      // P
    input = input.substring(input.indexOf('\n') + 1);   // D
  }                                                     // bb
}                                                       // }
:exit                                                   // :e

Translated back to your data:

sed -ri '/^<tr><td><\/td><td class="tdci">/ { :b; N; s/(\n<tr><td><\/td><td>)/\n<tr><td>\&nbsp;<\/td><\/tr>\1/; te; P; D; bb; }; :e' "$f"

Upvotes: 5

Related Questions