Reputation: 2378
I have 2 files.
File1 content looks like:
000000513609200,238/PLMN/000100
000000513609200,238/PLMN/000200
000050354428060,238/PLMN/000200
000050354428060,238/PLMN/000100
001212131415120,238/PLMN/000100
...
...
File2 contents:
000000513609200,238/PLMN/000100
000000513609200,238/PLMN/000200
000050354428060,238/PLMN/000200
000050354428060,238/PLMN/000100
001212131415120,238/PLMN/000100
...
...
File1 has close to 15000 records and file2 has close to 20000 records. I want to find the lines(records) present only in file1 or file2. I'm using the below:
comm -3 <(sort file1) <(sort file2) > file6
Is this a good option?
Also how exactly the sort works with these records ? How will it understand which column to take as primary key ?
Also can you suggest a simple awk script to do the comparison between file1 and file2 and forward the lines present either only in file1 or only in file2 to file7, so that I can compare the outputs. I want to make sure that my comm
is yielding the same result.
Upvotes: 1
Views: 3913
Reputation: 7862
This sorts with the -u (unique) flag and remove all duplicates in either files.
sort -u file1 file2 > file6
Upvotes: 2
Reputation: 786289
Using awk you can do this without sorting:
awk 'FNR==NR {
a[$0]
next
}
{
if ($0 in a)
delete a[$0]
else
print
}
END {
for (i in a)
print i
}' file1 file2
Similarly using grep
you can get the same using:
{ grep -vxFf file1 file2; grep -vxFf file2 file1; }
Upvotes: 2
Reputation: 67567
If the files are sorted (or can be sorted on the fly) you can also try join. Since you don't have good test input I'm showing on a toy example
$ seq 5 > f1
$ seq 3 9 > f2
this gives the common records in both files, same as comm -12 f1 f2
$ join f1 f2
3
4
5
this gives the unmatched records in both files, same as comm -3 f1 f2 | sed 's/^\t//'
$ join -v1 -v2 f1 f2
1
2
6
7
8
9
Upvotes: 0
Reputation: 14035
If I understood correctly, to simply sort the lines out based on any 'column', you can youse:
sort file1 file2 -t '/' -k 3 > file6
where -t '/' specifies the column delimiter, and -k 3 specifies the column number based on this delimiter.
As for the second question, if you just want to compare the files you try out the diff command and see if it helpful to you.
Upvotes: 0