Reputation: 83
I am trying to remove duplicate lines from a file including the original ones but the following command that I am trying is sorting the lines and I want them to be in the same order as they are in input file.
awk '{++a[$0]}END{for(i in a) if (a[i]==1) print i}' test.txt
Input:
123
aaa
456
123
aaa
888
bbb
Output I want:
456
888
bbb
Upvotes: 3
Views: 307
Reputation: 14899
awk '{ b[$0]++; a[n++]=$0; }END{ for (i in a){ if(b[a[i]]==1) print a[i] }}' input
Lines are added to array b
, the order of lines is kept in array a
.
If, in the end, the count is 1
, the line is printed.
Sorry, i misread the question at first, and i corrected the answer, to be almost the same as @Sundeep ...
Upvotes: 1
Reputation: 133428
If you want to do this in awk
only then could you please try following; if not worried about order.
awk '{a[$0]++};END{for(i in a){if(a[i]==1){print i}}}' Input_file
To get the unique values in same order in which they occur in Input_file try following.
awk '
!a[$0]++{
b[++count]=$0
}
{
c[$0]++
}
END{
for(i=1;i<=count;i++){
if(c[b[i]]==1){
print b[i]
}
}
}
' Input_file
Output will be as follows.
456
888
bbb
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
!a[$0]++{ ##Checking condition if current line is NOT occur in array a with more than 1 occurrence then do following.
b[++count]=$0 ##Creating an array b with index count whose value is increasing with 1 and its value is current line value.
}
{
c[$0]++ ##Creating an array c whose index is current line and its value is occurrence of current lines.
}
END{ ##Starting END block for this awk program here.
for(i=1;i<=count;i++){ ##Starting for loop from here.
if(c[b[i]]==1){ ##Checking condition if value of array c with index is value of array b with index i equals to 1 then do following.
print b[i] ##Printing value of array b.
}
}
}
' Input_file ##Mentioning Input_file name here.
Upvotes: 4
Reputation: 23667
Simpler code if you are okay with reading input file twice:
$ awk 'NR==FNR{a[$0]++; next} a[$0]==1' ip.txt ip.txt
456
888
bbb
With single pass:
$ awk '{a[NR]=$0; b[$0]++} END{for(i=1;i<=NR;i++) if(b[a[i]]==1) print a[i]}' ip.txt
456
888
bbb
Upvotes: 5