Reputation: 13407

Extract unique lines from a file in order of their first occurence in bash

I have a file with a list of strings. I'd like to extract out the unique strings, in the order that they first appear in the file.

So, for instance, if my file contains:

foo
bar
foo
bar
baz
bar
foo

I'd like to output:

foo
bar
baz

If I just wanted the unique values, I could use sort input|uniq, but this sorts my result alphabetically.

Upvotes: 3

Answers (5)

Reputation: 241721

I think what Nick was aiming at is something like this:

sort test.txt | uniq | xargs -I{} grep -Fnxm1 {} test.txt | sort -k1n -t: | cut -f2 -d:

Or maybe I'm reading too much into his suggestion. I think the awk answer is much cooler, though.

Upvotes: 2

Reputation: 95252

bash 4:

declare -A seen
while read line; do 
  if (( ! seen["$line"]++ )); then 
    echo "$line"
  fi
done <file.txt

For bash <= 3, I would use something else that has associative arrays, like choroba's perl solution, or awk:

awk '!seen[$0]++' file.txt

Upvotes: 1

Reputation: 2903

I can't quite get it, but something like this:

sort test.txt | uniq | xargs -0 -I {} grep {} test.txt

Maybe someone can fix?

Upvotes: -1

Reputation: 56059

Quite simple in awk:

awk '!a[$0]++'

Upvotes: 12

Reputation: 241868

Simple Perl solution:

perl -ne 'print unless $seen{$_}++'

If your last line does not contain a newline, you might need to change it to

perl -nE 'chomp; say unless $seen{$_}++'

Upvotes: 4