Graham
Graham

Reputation: 1759

Print the last occurrences of duplicate line only

I've got stdout from a command for which I'd like to strip duplicates in reverse order.

That is, I'd like the duplicate lines stripped from the beginning not from the end. For example, to strip from the end I might use the classic technique with awk:

awk '!a[$0]++'

While brilliant, it strips the wrong lines:

$ printf 'one\nfour\ntwo\nthree\nfour\n' | awk '!a[$0]++'
one
four
two
three

I'd like the last occurrence of four printing i.e.

$ printf 'one\nfour\ntwo\nthree\nfour\n' | <script>
one
two
three
four

How do I do this? Is there a simple way with a one-liner in shell?

Upvotes: 2

Views: 358

Answers (2)

ghoti
ghoti

Reputation: 46896

Using your example to generate input for testing:

printf 'one\nfour\ntwo\nthree\nfour\n'

The easiest way to handle this is simply to reverse your data, twice. The following works in BSD and OS X:

command | tail -r | awk '!a[$0]++' | tail -r

But the -r option isn't universal. If you're on Linux, you can generate the same effect with the tac command (opposite of cat) which is part of coreutils:

command | tac | awk '!a[$0]++' | tac

If neither of these works (i.e. you're on HP/UX or older Solaris, etc), you may be able to reverse things using sed:

command | sed '1!G;h;$!d' | awk '!a[$0]++' | sed '1!G;h;$!d'

Of course, you could do this with perl as well:

command | perl -e 'print reverse <>' | awk '!a[$0]++' | perl -e 'print reverse <>'

But if perl is available on your system, you might as well simplify the pipe and skip awk entirely:

command | perl -e '$a{$_}++ or print for reverse <>'

I've never really liked perl, though, and I do like doing things in shell. If you're in bash (version 4 or up), and you don't care much about performance, you can implement an array right in your shell:

mapfile -t a < <(command)
declare -A b;
for (( i=${#a[@]}-1 ; i>=0; i-- )); do ((b[${a[$i]}]++)) || echo "${a[$i]}"; done

No external tools required. :-)

UPDATE:

Inspired (or perhaps challenged) by sudo_O's answer, here's one more option that works in pure awk on BSD (i.e. doesn't require GNU awk):

command | awk '{a[NR]=$0;b[$0]=NR} END {for(i=1;i<=NR;i++) if(i==b[a[i]]) print a[i]}'

Note that this stores all input in memory twice, so it may be inappropriate for large datasets.

Upvotes: 5

Chris Seymour
Chris Seymour

Reputation: 85883

In practice I would use ghoti technique (rev) but here is a single GNU awk script to print the last occurrences:

command | awk '{a[$0]=NR;b[NR]=$0}END{n=asort(a);for(i=1;i<=n;i++)print b[a[i]]}'
one
two
three
four

Upvotes: 2

Related Questions