nuclearwinter27
nuclearwinter27

Reputation: 83

shell awk script to remove duplicate lines

I am trying to remove duplicate lines from a file including the original ones but the following command that I am trying is sorting the lines and I want them to be in the same order as they are in input file.

awk '{++a[$0]}END{for(i in a) if (a[i]==1) print i}' test.txt
Input:
123
aaa
456
123
aaa
888
bbb

Output I want:
456
888
bbb

Upvotes: 3

Views: 307

Answers (3)

Luuk
Luuk

Reputation: 14899

awk '{ b[$0]++; a[n++]=$0; }END{ for (i in a){ if(b[a[i]]==1) print a[i] }}' input

Lines are added to array b, the order of lines is kept in array a. If, in the end, the count is 1, the line is printed.

Sorry, i misread the question at first, and i corrected the answer, to be almost the same as @Sundeep ...

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133428

If you want to do this in awk only then could you please try following; if not worried about order.

awk '{a[$0]++};END{for(i in a){if(a[i]==1){print i}}}' Input_file


To get the unique values in same order in which they occur in Input_file try following.

awk '
!a[$0]++{
  b[++count]=$0
}
{
  c[$0]++
}
END{
  for(i=1;i<=count;i++){
    if(c[b[i]]==1){
      print b[i]
    }
  }
}
'  Input_file

Output will be as follows.

456
888
bbb

Explanation: Adding detailed explanation for above code.

awk '                        ##Starting awk program from here.
!a[$0]++{                    ##Checking condition if current line is NOT occur in array a with more than 1 occurrence then do following.
  b[++count]=$0              ##Creating an array b with index count whose value is increasing with 1 and its value is current line value.
}
{
  c[$0]++                    ##Creating an array c whose index is current line and its value is occurrence of current lines.
}
END{                         ##Starting END block for this awk program here.
  for(i=1;i<=count;i++){     ##Starting for loop from here.
    if(c[b[i]]==1){          ##Checking condition if value of array c with index is value of array b with index i equals to 1 then do following.
      print b[i]             ##Printing value of array b.
    }
  }
}
'  Input_file                ##Mentioning Input_file name here.

Upvotes: 4

Sundeep
Sundeep

Reputation: 23667

Simpler code if you are okay with reading input file twice:

$ awk 'NR==FNR{a[$0]++; next} a[$0]==1' ip.txt ip.txt
456
888
bbb


With single pass:

$ awk '{a[NR]=$0; b[$0]++} END{for(i=1;i<=NR;i++) if(b[a[i]]==1) print a[i]}' ip.txt
456
888
bbb

Upvotes: 5

Related Questions