Reputation: 53
Below is my file 1 content:
123|yid|def|
456|kks|jkl|
789|mno|vsasd|
and this is my file 2 content
123|abc|def|
456|ghi|jkl|
789|mno|pqr|
134|rst|uvw|
The only thing I want to compare in File 1 based on File 2 is column 1. Based on the files above, the output should only output:
134|rst|uvw|
Line to Line comparisons are not the answer since both column 2 and 3 contains different things but only column 1 contains the exact same thing in both files.
How can I achieve this?
Currently I'm using this in my code:
#sort FILEs first before comparing
sort $FILE_1 > $FILE_1_sorted
sort $FILE_2 > $FILE_2_sorted
for oid in $(cat $FILE_1_sorted |awk -F"|" '{print $1}');
do
echo "output oid $oid"
#for every oid in FILE 1, compare it with oid FILE 2 and output the difference
grep -v diff "^${oid}|" $FILE_1 $FILE_2 | grep \< | cut -d \ -f 2 > $FILE_1_tmp
Upvotes: 1
Views: 323
Reputation: 85570
You can do this in Awk
very easily!
awk 'BEGIN{FS=OFS="|"}FNR==NR{unique[$1]; next}!($1 in unique)' file1 file2
Awk
works by processing input lines one at a time. And there are special clauses which Awk
provides, BEGIN{}
and END{}
which encloses actions to be run before and after the processing of the file.
So the part BEGIN{FS=OFS="|"}
is set before the file processing happens, and FS
and OFS
are special variables in Awk
which stand for input and output field separators. Since you have a provided a file that is de-limited by |
you need to parse it by setting FS="|"
also to print it back with |
, so set OFS="|"
The main part of the command comes after BEGIN
clause, the part FNR==NR
is meant to process the first file argument provided in the command, because FNR
keeps track of the line numbers for the both the files combined and NR
for only the current file. So for each $1
in the first file, the values are hashed into the array called unique
and then when the next file processing happens, the part !($1 in unique)
will drop those lines in second file whose $1
value is not int the hashed array.
Upvotes: 4
Reputation: 17169
Here is another one liner that uses join
, sort
and grep
join -t"|" -j 1 -a 2 <(sort -t"|" -k1,1 file1) <(sort -t"|" -k1,1 file2) |\
grep -E -v '.*\|.*\|.*\|.*\|'
join
does two things here. It pairs all lines from both files with matching keys and, with the -a 2
option, also prints the unmatched lines from file2.
Since join
requires input files to be sorted, we sort them.
Finally, grep
removes all lines that contain more than three fields from the output.
Upvotes: 1