Reputation: 305
file1
1|footbal|play1
2|cricket1|play2
3|golf|play3
4|tennis|play4
5|bowling|play5
file 2
1|footbal|play1
2|cricket|play2
4|tennis|play4
i am comparing file2 with file1 and output should be
3|golf|play3
5|bowling|play5
i need only records which are not present in file2 and should be in file1.
awk 'NR==FNR {exclude[$0];next} !($0 in exclude)' file2.txt file1.txt
This is not giving expected result.
Upvotes: 1
Views: 1313
Reputation: 133458
EDIT: Trying one more time to get OP's expected output by making first field as index key.
awk '
BEGIN{
FS="|"
}
NR==FNR{
exclude[$1]
next
}
!($1 in exclude)
' file2.txt file1.txt
Your code looks good, could you please try following there may be a have if control M characters in your samples. Try removing them before processing them.
awk '{gsub(/\r|[[:space:]]+$/,"")} NR==FNR {exclude[$0];next} !($0 in exclude)' file2.txt file1.txt
I am also removing the space from last of the lines incase you have any.
Upvotes: 2
Reputation: 52344
You can certainly use awk, but comm
is purpose-built to print out commonalities and differences between two files:
$ comm -23 file1.txt file2.txt
3|golf|play3
5|bowling|play5
(I assume the cricket1
in your sample file1 is a typo, given your expected output).
The catch is that the files have to be sorted in lexicographic order, while based on your sample, yours are sorted numerically based on the first column, which is different once you have a 10 or higher. So, a minor change might be needed (Requires bash
, zsh
, or another shell that understands <(command)
syntax:
comm -23 <(sort file1.txt) <(sort file2.txt)
comm
takes three important arguments - -1
, which suppresses lines only present in the first file, -2
, which suppresses lines only present in the second file, and -3
, which suppresses lines present in both files. So -23
ends up printing only lines that are unique to the first file. -13
would print lines that are unique to the second file.
Upvotes: 0