KGee
KGee

Reputation: 373

Remove duplicate records from a three column file

I am trying to remove some entries from a txt file containing 3 columns. The first two contains ID entries and the third one, contains its percentage as follows:

ID#3    ID#1    100.00
ID#4    ID#4    40.00
ID#4    ID#5    33.065
ID#5    ID#5    100.000    
ID#5    ID#4    33.065
ID#6    ID#6    100.000

I want to "remove" every entry with the same ID BUT ONLY WHEN the percentage is 100% so as the desired output will be like:

ID#3    ID#1    100.00    
ID#4    ID#4    40.00
ID#4    ID#5    33.065
ID#5    ID#4    33.065

I tried this:

cat file.txt | awk '$3!=100.0 && $1=$2 {print $1,$2}'

but I cant find a way to include the cases when the first two columns are not the same!

Upvotes: 2

Views: 152

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133590

Could you please try following.

awk '($1==$2) && $NF==100{next} 1' Input_file

Explanation: Adding detailed explanation for above code.

awk '                       ##Starting awk program from here.
($1==$2) && $NF==100{       ##Checking condition if $1(first field) equals to $2(2nd field) AND $NF(last field) equals 100 then do following.
  next                      ##next will SKIP all further statements from here.
}                           ##Closing BLOCK for above condition here.
1                           ##Mentioning 1 will print edited/non-edited lines here.
' Input_file                ##Mentioning Input_file name here.

Upvotes: 0

Sundeep
Sundeep

Reputation: 23677

$ awk '!($1==$2 && $3==100)' ip.txt
ID#3    ID#1    100.00
ID#4    ID#4    40.00
ID#4    ID#5    33.065
ID#5    ID#4    33.065
  • $1==$2 checks if first column same as second column
  • $3==100 checks if third column is 100
  • && to check if both the conditions above are true
  • !($1==$2 && $3==100) to invert the combined condition
  • can also use $1!=$2 || $3!=100 (See https://en.wikipedia.org/wiki/De_Morgan%27s_laws)

Also note that file can be passed directly to awk command. And not sure why you are using {print $1,$2} when the expected output shown contains three columns.

Upvotes: 3

Related Questions