Reputation: 33
I've got files which look like this, (there can be more columns or rows):
dif-1-2-3-4.com 1 1 1
dif-1-2-3-5.com 1 1 2
dif-1-2-4-5.com 1 2 1
dif-1-3-4-5.com 2 1 1
dif-2-3-4-5.com 1 1 1
And I want to compare these numbers:
1 1 1
1 1 2
1 2 1
2 1 1
1 1 1
And print only those rows which do not repeat, so I get this:
dif-1-2-3-4.com 1 1 1
dif-1-2-3-5.com 1 1 2
dif-1-2-4-5.com 1 2 1
dif-1-3-4-5.com 2 1 1
Upvotes: 0
Views: 88
Reputation: 103754
This works with POSIX and gnu awk:
$ awk '{s=""
for (i=2;i<=NF; i++)
s=s $i "|"}
s in seen { next }
++seen[s]' file
Which can be shortened to:
$ awk '{s=""; for (i=2;i<=NF; i++) s=s $i "|"} !seen[s]++' file
Also supports a variable number of columns.
If you want a sort
uniq
solution that also respects file order (i.e. the first of the set of duplicates is printed, not the later ones) you need to do a decorate, sort, undecorate approach.
You can:
cat -n
to decorate the file with line numbers; sort -k3 -k1n
to sort first on all the fields starting at the 3 though the end of the line then numerically on the line number added;-u
if your version of sort
supports that or use uniq -f3
to only keep the first in the group of dups;finally use sed -e 's/^[[:space:]]*[0-9]*[[:space:]]*//
to remove the added line numbers:
cat -n file | sort -k3 -k1n | uniq -f3 | sed -e 's/^[[:space:]]*[0-9]*[[:space:]]*//'
Awk is easier and faster in this case.
Upvotes: 1
Reputation: 133458
Try, the following awk
code too:
awk '!a[$2,$3,$4]++' Input_file
Explanation:
Create an array named a
and its indexes as $2,$3,$4
. The condition here is !a
, (which means any line's $2,$3,$4
are NOT present in array a
), and then doing 2 things:
$2,$3,$4
indexes in array a
. awk
works in the mode of condition and then action), so the default action will be to print the current line. This will go on for all the lines in Input_file, and the last line will not be printed as its $2,$3,$4
are already present in array a
. I hope this helps.
Upvotes: 2
Reputation: 84541
Another simple approach is sort
with uniq
using a KEYDEF for fields 2-4 with sort
and skipping field 1 with uniq
, e.g.
$ sort file.txt -k 2,4 | uniq -f1
Example Use/Output
$ sort file.txt -k 2,4 | uniq -f1
dif-1-2-3-4.com 1 1 1
dif-1-2-3-5.com 1 1 2
dif-1-2-4-5.com 1 2 1
dif-1-3-4-5.com 2 1 1
Upvotes: 4
Reputation: 10865
Keep a running record of the triples already seen and only print the first time they appear:
$ awk '!(($2,$3,$4) in seen) {print; seen[$2,$3,$4]}' file
dif-1-2-3-4.com 1 1 1
dif-1-2-3-5.com 1 1 2
dif-1-2-4-5.com 1 2 1
dif-1-3-4-5.com 2 1 1
Upvotes: 2