Reputation: 129
I have a long series of files. Some of them have lines in common. I am trying to use awk to find the lines that are different between two files and then print that number to a variable for use outside of awk.
Here is what my awk code currently looks like:
awk 'NR==FNR{a[$1FS$2]=$0;next} {print (!a[$1FS$2]?$0:"")}' C6H6_1651.com C6H6_1652.com | awk 'END { print NR }'
What I get out is 32
which is the number of lines in each of those files. I know from looking at those files that the desired output should be 2
, as there are only two lines that are different between the two files.
Other arrangements of these awk commands that I have tried are:
awk 'NR==FNR{!a[$1FS$2]?$0:"";next} END { print NR }' C6H6_1651.com C6H6_1652.com
which outputs 64
awk 'NR==FNR{a[$1FS$2]=$0;next} {print (!a[$1FS$2]?$0:"")} END { printf NR }' C6H6_1651.com C6H6_1652.com
which outputs a line for every line in the document but the only lines that contain text are the ones that don't match between the two files. 64
then follows up this block of text.
Here are the contents of C6H6_1651.com
%chk=C6H6_1651.chk
%nproc=20
# mp2/cc-pVTZ
C6H6_1651
0 1
C 0.000000000 1.394800000 0.000000000
C 0.000000000 -1.394800000 0.000000000
C 1.207900000 0.697400000 0.000000000
C -1.207900000 0.697400000 0.000000000
C -1.207900000 -0.697400000 0.000000000
C 1.207900000 -0.697400000 0.000000000
C 0.000000000 1.394800000 3.000000000
C 0.000000000 -1.394800000 3.000000000
C 1.207900000 0.697400000 3.000000000
C -1.207900000 0.697400000 3.000000000
C -1.207900000 -0.697400000 3.000000000
C 1.207900000 -0.697400000 3.000000000
H 0.000000000 2.482200000 0.000000000
H 2.149700000 1.241100000 0.000000000
H -2.149700000 1.241100000 0.000000000
H -2.149700000 -1.241100000 0.000000000
H 2.149700000 -1.241100000 0.000000000
H 0.000000000 -2.482200000 0.000000000
H 0.000000000 2.482200000 3.000000000
H 2.149700000 1.241100000 3.000000000
H -2.149700000 1.241100000 3.000000000
H -2.149700000 -1.241100000 3.000000000
H 2.149700000 -1.241100000 3.000000000
H 0.000000000 -2.482200000 3.000000000
Here are the contents of C6H6_1652.com
%chk=C6H6_1652.chk
%nproc=20
# mp2/cc-pVTZ
C6H6_1652
0 1
C 0.000000000 1.394800000 0.000000000
C 0.000000000 -1.394800000 0.000000000
C 1.207900000 0.697400000 0.000000000
C -1.207900000 0.697400000 0.000000000
C -1.207900000 -0.697400000 0.000000000
C 1.207900000 -0.697400000 0.000000000
C 0.000000000 1.394800000 3.000000000
C 0.000000000 -1.394800000 3.000000000
C 1.207900000 0.697400000 3.000000000
C -1.207900000 0.697400000 3.000000000
C -1.207900000 -0.697400000 3.000000000
C 1.207900000 -0.697400000 3.000000000
H 0.000000000 2.482200000 0.000000000
H 2.149700000 1.241100000 0.000000000
H -2.149700000 1.241100000 0.000000000
H -2.149700000 -1.241100000 0.000000000
H 2.149700000 -1.241100000 0.000000000
H 0.000000000 -2.482200000 0.000000000
H 0.000000000 2.482200000 3.000000000
H 2.149700000 1.241100000 3.000000000
H -2.149700000 1.241100000 3.000000000
H -2.149700000 -1.241100000 3.000000000
H 2.149700000 -1.241100000 3.000000000
H 0.000000000 -2.482200000 3.000000000
Upvotes: 0
Views: 139
Reputation: 133710
In case you want to do this in awk try. Following will show lines which are present in both files.
awk '
FNR==NR{
array[$0]
next
}
($0 in array)
' Input_file1 Input_file2
OR to get number of lines in awk
itself try:
awk '
FNR==NR{
array[$0]
next
}
($0 in array){
count++
}
END{
print "Total matching lines are:" count
}
' Input_file1 Input_file2
To know lines which are present in file1 and not in file2 try:
awk '
FNR==NR{
array[$0]
next
}
!($0 in array)
' Input_file1 Input_file2
OR
awk '
FNR==NR{
array[$0]
next
}
!($0 in array){
count++
}
END{
print "Total lines found in file1 and NOT in file2 are:"count
}
' Input_file1 Input_file2
To get lines which are present in file2 and not in file1 try:
awk '
FNR==NR{
array[$0]
next
}
!($0 in array)
' Input_file2 Input_file1
OR
awk '
FNR==NR{
array[$0]
next
}
!($0 in array){
count++
}
END{
print "Total lines found in file2 and NOT in file1 are:"count
}
' Input_file2 Input_file1
Above solutions(without END
block one) will print lines in case you need to know only number of lines append | wc -l
to above commands.
Upvotes: 2