Reputation: 55
I will ask my question with an example. I have 2 files:
File # 1
002948998 752986QAK NTR974VTS 0000000
102948932 752986QSC NTR974VTS 0000000
102948933 752986QSC NTR974VTS 0000000
File #2
002901998 752986KFK NTR974MTS 0990000
002948998 752986QAQ NTR974VTS 0000000
002901998 752986KFK NTR974MTJ 0990000
Desired output :
002948998 752986QAK NTR974VTS 0000000
102948932 752986QSC NTR974VTS 0000000
102948933 752986QSC NTR974VTS 0000000
002901998 752986KFK NTR974MTS 0990000
Note: there is no gap (enter) between the rows.
I'd like to compare file 1 and file 2 using their first columns and remove the entire row from file 2 if they match in file 1. I'd also like to save the results to a 1st file or a new file, file #3 which contains all the entries from file 1 and file 2 ( without the duplicates from file 2). Please advise a good resolution in shell script.
currently i am using :
awk 'FNR==NR {a[$1];print;next} !($1 in a)' file1 file2 > file3
but it is not comparing based on only 1st column. instead it is comparing the whole row.
Please help.
Upvotes: 1
Views: 2559
Reputation: 246754
This is a famous-ish awk idiom: print a line only when the first field is seen for the first time:
awk '!seen[$1]++' file1 file2 > file3
002948998 752986QAK NTR974VTS 0000000
102948932 752986QSC NTR974VTS 0000000
102948933 752986QSC NTR974VTS 0000000
002901998 752986KFK NTR974MTS 0990000
This relies on:
Preserving all lines in first file, while removing dups in second file
awk '!seen[$1]++ || NR==FNR' file1 file2 > file3
Upvotes: 5
Reputation: 626
Maybe:
cp file1 file3;
grep -Fv "$(cut -f 1 -d ' ' < file1)" file2 >> file3
Upvotes: 0