Reputation: 25

How to KEEP only the last line of consecutive lines starting with the same word?

See this thread : How to remove the second line of consecutive lines starting with the same word?

Instead of keeping the first duplicate consecutive line starting with "TITLE", I would like to only keep the last one, to get from this input:

TITLE something
DATA some data
TITLE something else
DATA some other data
TITLE some more
TITLE extra info
DATA some more data

This output:

TITLE something
DATA some data
TITLE something else
DATA some other data
TITLE extra info
DATA some more data

Also, I'd like to be able to handle an arbitrary number of repetitions, and not only 2 (if by example 7 lines in a row start by "TITLE", only keep the last one).

Like the other post, it can be a perl/bash/sed/awk command that only keep the last line and output the rest of the file as well. I've been workng on this for a long time, but I could only find solutions that does the opposite of what I want.

Upvotes: 2

Answers (4)

potong

Reputation: 58430

This might work for you (GNU sed):

sed -r 'N;/^(TITLE ).*\n\1/!P;D' file

This compares 2 lines and if the first is the same as the second does not print the first.

Upvotes: 1

Borodin

Reputation: 126722

If you're looking for a Perl one-line solution, like the one in the question that you linked, then this will do

perl -ne'if (/^TITLE/) {$t = $_} else {print $t, $_; $t = ""}' myfile

Note that it will not print a TITLE line at all unless it is followed by a line that doesn't begin with TITLE

Upvotes: 2

Ed Morton

Reputation: 203665

Just reverse the order of lines, then print the now-first occurrence, then reverse them again:

$ tac file | awk '$1!=prev; {prev=$1}' | tac                  
TITLE something
DATA some data
TITLE something else
DATA some other data
TITLE extra info
DATA some more data

or if there can be multiple consecutive DATA lines and you want to keep all of those:

$ tac file | awk '!($1=="TITLE" && $1==prev); {prev=$1}' | tac
TITLE something
DATA some data
TITLE something else
DATA some other data
TITLE extra info
DATA some more data

Upvotes: 2

Wintermute

Reputation: 44043

With sed:

sed '/^TITLE/ { :a $! { N; /\nTITLE/ { s/.*\n//; ba; }; }; }' filename

That is:

/^TITLE/ {          # if a line begins with TITLE
  :a                # jump label for looping.
   $! {             # unless we hit the end of input (in case the file
                    # ends with title lines)
     N              # fetch the next line
     /\nTITLE/ {    # if it begins with TITLE as well
       s/.*\n//     # remove the first
       ba           # go back to a
     }
   }
 }

Upvotes: 2

How to KEEP only the last line of consecutive lines starting with the same word?

Answers (4)

Related Questions