rakesh kandukuri
rakesh kandukuri

Reputation: 305

How to find the non matching records between two files using awk

file1

1|footbal|play1
2|cricket1|play2
3|golf|play3
4|tennis|play4
5|bowling|play5

file 2

1|footbal|play1
2|cricket|play2
4|tennis|play4

i am comparing file2 with file1 and output should be

3|golf|play3
5|bowling|play5

i need only records which are not present in file2 and should be in file1.

awk 'NR==FNR {exclude[$0];next} !($0 in exclude)' file2.txt file1.txt

This is not giving expected result.

Upvotes: 1

Views: 1313

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133458

EDIT: Trying one more time to get OP's expected output by making first field as index key.

awk '
BEGIN{
  FS="|"
}
NR==FNR{
  exclude[$1]
  next
}
!($1 in exclude)
' file2.txt file1.txt



Your code looks good, could you please try following there may be a have if control M characters in your samples. Try removing them before processing them.

awk '{gsub(/\r|[[:space:]]+$/,"")} NR==FNR {exclude[$0];next} !($0 in exclude)' file2.txt file1.txt

I am also removing the space from last of the lines incase you have any.

Upvotes: 2

Shawn
Shawn

Reputation: 52344

You can certainly use awk, but comm is purpose-built to print out commonalities and differences between two files:

$ comm -23 file1.txt file2.txt
3|golf|play3
5|bowling|play5

(I assume the cricket1 in your sample file1 is a typo, given your expected output).

The catch is that the files have to be sorted in lexicographic order, while based on your sample, yours are sorted numerically based on the first column, which is different once you have a 10 or higher. So, a minor change might be needed (Requires bash, zsh, or another shell that understands <(command) syntax:

comm -23 <(sort file1.txt) <(sort file2.txt)

comm takes three important arguments - -1, which suppresses lines only present in the first file, -2, which suppresses lines only present in the second file, and -3, which suppresses lines present in both files. So -23 ends up printing only lines that are unique to the first file. -13 would print lines that are unique to the second file.

Upvotes: 0

Related Questions