Shaun
Shaun

Reputation: 490

How does this sed command: "sed -e :a -e '$d;N;2,10ba' -e 'P;D' " work?

I saw a sed command to delete the last 10 rows of data:

 sed -e :a -e '$d;N;2,10ba' -e 'P;D'

But I don't understand how it works. Can someone explain it for me?

UPDATE:

Here is my understanding of this command:

  1. The first script indicates that a label “a” is defined.
  2. The second script indicates that it first determines whether the line currently reading pattern space is the last line. If it is, execute the "d" command to delete it and restart the next cycle; if not, skip the "d" command; then execute "N" command: append a new line from the input file to the pattern space, and then execute "2,10ba": if the line currently reading the pattern space is a line in the 2nd to 10th lines, jump to label "a".
  3. The third script indicates that if the line currently read into pattern space is not a line from line 2 to line 10, first execute "P" command: the first line in pattern space is printed, and then execute "D" command: the first line in pattern space is deleted.

My understanding of "$d" is that "d" will be executed when sed reads the last line into the pattern space. But it seems that every time "ba" is executed, "d" will be executed, regardless of Whether the current line read into pattern space is the last line. why?

Upvotes: 0

Views: 597

Answers (2)

user7712945
user7712945

Reputation:

a simpler resort, if your data in 'd' file by gnu sed,

sed -Ez 's/(.*\n)(.*\n){10}$/\1/' d

                         ^

pointed 10 is number of last line to remove

just move the brace group to invert, ie. to get only the last 10 lines

sed -Ez 's/.*\n((.*\n){10})$/\1/' d

Upvotes: 0

choroba
choroba

Reputation: 241948

:a is a label. $ in the address means the last line, d means delete. N stands for append the next line into the pattern space. 2,10 means lines 2 to 10, b means branch (i.e. goto), P prints the first line from the pattern space, D is like d but operates on the pattern space if possible.

In other words, you create a sliding window of the size 10. Each line is stored into it, and once it has 10 lines, lines start to get printed from the top of it. Every time a line is printed, the current line is stored in the sliding window at the bottom. When the last line gets printed, the sliding window is deleted, which removes the last 10 lines.

You can modify the commands to see what's getting deleted (()), stored (<>), and printed by the P ([]):

$ printf '%s\n'  {1..20} | \
    sed -e ':a ${s/^/(/;s/$/)/;p;d};s/^/</;s/$/>/;N;2,10ba;s/^/[/;s/$/]/;P;D'
[<<<<<<<<<<1>
[<2>
[<3>
[<4>
[<5>
[<6>
[<7>
[<8>
[<9>
[<10>
(11]>
12]>
13]>
14]>
15]>
16]>
17]>
18]>
19]>
20])

Upvotes: 2

Related Questions