Jakub
Jakub

Reputation: 699

Program which compares two files and print the same line

I have a problem. I want to create program, which print all lines which are in the first file and second file.

awk 'NR==FNR {include[$0];next} $0 in include' eq3_dgdg_1.ndx eq3_dgdg_2.ndx | tee eq4_dgdg_2.ndx

eq3_dgdg_1.ndx input

DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL1598
DGD2 SOL63

eq3_dgdg_2.ndx

DGD1 SOL3605
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL1176
DGD2 SOL1945
DGD2 SOL63

Output - and here is an error DGD1 SOL3605 - should be only once! Because I have in the first file only one line DGD1 SOL3605, not two, could you help me with that error?

DGD1 SOL3605
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL63

Expected output

DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL63

Upvotes: 2

Views: 101

Answers (3)

Ed Morton
Ed Morton

Reputation: 203209

Based on one possible interpretation of your question:

$ sort -u file2 | awk 'NR==FNR{a[$0];next} $0 in a' file1 -
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL63

or awk only:

$ awk 'NR==FNR{a[$0];next} $0 in a{print; delete a[$0]}' file1 file2
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL63

Upvotes: 3

RavinderSingh13
RavinderSingh13

Reputation: 133428

Could you please try following. Written and tested with shown samples in GNU awk.

awk 'FNR==NR{arr[$0];next} ($0 in arr) && !arr2[$0]++' eq3_dgdg_1.ndx eq3_dgdg_2.ndx

Explanation: Adding detailed explanation for above.

awk '                             ##Starting awk program from here.
FNR==NR{                          ##Checking condition FNR==NR for first file processing.
  arr[$0]                         ##Creating arr with index of current line.
  next                            ##next will skip all further statements from here.
}
($0 in arr) && !arr2[$0]++        ##Checking if current line present in arr AND current line coming first time in arr2 then print it.
' eq3_dgdg_1.ndx eq3_dgdg_2.ndx   ##Mentioning Input_file name here.

Upvotes: 2

Kent
Kent

Reputation: 195039

If duplicated lines in a file are allowed, you need a counter. Give this a try:

awk 'NR==FNR{a[$0]++;next}a[$0]-->0' f1 f2

Let's have a test with your data:

kent$  head f*
==> f1 <==
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL1598
DGD2 SOL63

==> f2 <==
DGD1 SOL3605
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL1945
DGD2 SOL63

kent$  awk 'NR==FNR{a[$0]++;next}a[$0]-->0' f1 f2
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL63

Upvotes: 4

Related Questions