Reputation: 1759
I've got stdout from a command for which I'd like to strip duplicates in reverse order.
That is, I'd like the duplicate lines stripped from the beginning not from the end. For example, to strip from the end I might use the classic technique with awk
:
awk '!a[$0]++'
While brilliant, it strips the wrong lines:
$ printf 'one\nfour\ntwo\nthree\nfour\n' | awk '!a[$0]++'
one
four
two
three
I'd like the last occurrence of four
printing i.e.
$ printf 'one\nfour\ntwo\nthree\nfour\n' | <script>
one
two
three
four
How do I do this? Is there a simple way with a one-liner in shell?
Upvotes: 2
Views: 358
Reputation: 46896
Using your example to generate input for testing:
printf 'one\nfour\ntwo\nthree\nfour\n'
The easiest way to handle this is simply to reverse your data, twice. The following works in BSD and OS X:
command | tail -r | awk '!a[$0]++' | tail -r
But the -r
option isn't universal. If you're on Linux, you can generate the same effect with the tac
command (opposite of cat
) which is part of coreutils:
command | tac | awk '!a[$0]++' | tac
If neither of these works (i.e. you're on HP/UX or older Solaris, etc), you may be able to reverse things using sed
:
command | sed '1!G;h;$!d' | awk '!a[$0]++' | sed '1!G;h;$!d'
Of course, you could do this with perl as well:
command | perl -e 'print reverse <>' | awk '!a[$0]++' | perl -e 'print reverse <>'
But if perl is available on your system, you might as well simplify the pipe and skip awk entirely:
command | perl -e '$a{$_}++ or print for reverse <>'
I've never really liked perl, though, and I do like doing things in shell. If you're in bash (version 4 or up), and you don't care much about performance, you can implement an array right in your shell:
mapfile -t a < <(command)
declare -A b;
for (( i=${#a[@]}-1 ; i>=0; i-- )); do ((b[${a[$i]}]++)) || echo "${a[$i]}"; done
No external tools required. :-)
UPDATE:
Inspired (or perhaps challenged) by sudo_O's answer, here's one more option that works in pure awk on BSD (i.e. doesn't require GNU awk):
command | awk '{a[NR]=$0;b[$0]=NR} END {for(i=1;i<=NR;i++) if(i==b[a[i]]) print a[i]}'
Note that this stores all input in memory twice, so it may be inappropriate for large datasets.
Upvotes: 5
Reputation: 85883
In practice I would use ghoti technique (rev
) but here is a single GNU awk
script to print the last occurrences:
command | awk '{a[$0]=NR;b[NR]=$0}END{n=asort(a);for(i=1;i<=n;i++)print b[a[i]]}'
one
two
three
four
Upvotes: 2