Rinku
Rinku

Reputation: 55

Compare two files based on first column only and remove duplicate row from 2nd file in shell script

I will ask my question with an example. I have 2 files:

File # 1

002948998 752986QAK NTR974VTS 0000000    
102948932 752986QSC NTR974VTS 0000000    
102948933 752986QSC NTR974VTS 0000000

File #2

002901998 752986KFK NTR974MTS 0990000
002948998 752986QAQ NTR974VTS 0000000
002901998 752986KFK NTR974MTJ 0990000

Desired output :

002948998 752986QAK NTR974VTS 0000000    
102948932 752986QSC NTR974VTS 0000000    
102948933 752986QSC NTR974VTS 0000000    
002901998 752986KFK NTR974MTS 0990000

Note: there is no gap (enter) between the rows.

I'd like to compare file 1 and file 2 using their first columns and remove the entire row from file 2 if they match in file 1. I'd also like to save the results to a 1st file or a new file, file #3 which contains all the entries from file 1 and file 2 ( without the duplicates from file 2). Please advise a good resolution in shell script.

currently i am using :

awk 'FNR==NR {a[$1];print;next} !($1 in a)' file1 file2 > file3

but it is not comparing based on only 1st column. instead it is comparing the whole row.

Please help.

Upvotes: 1

Views: 2559

Answers (2)

glenn jackman
glenn jackman

Reputation: 246754

This is a famous-ish awk idiom: print a line only when the first field is seen for the first time:

awk '!seen[$1]++' file1 file2 > file3
002948998 752986QAK NTR974VTS 0000000    
102948932 752986QSC NTR974VTS 0000000    
102948933 752986QSC NTR974VTS 0000000
002901998 752986KFK NTR974MTS 0990000

This relies on:

  • awk considering unset array elements as zero
  • post-increment returning the variable's current value
  • default action for "true" condition is to print the line

Preserving all lines in first file, while removing dups in second file

awk '!seen[$1]++ || NR==FNR' file1 file2 > file3

Upvotes: 5

shooper
shooper

Reputation: 626

Maybe:

cp file1 file3;
grep -Fv "$(cut -f 1 -d ' ' < file1)" file2 >> file3

Upvotes: 0

Related Questions