Reputation: 1
I need to compare two files column by column using unix shell, and store the difference in a resulting file.
For example if column 1 of the 1st record of the 1st file matches the column 1 of the 1st record of the 2nd file then the result will be stored as '=' in the resulting file against the column, but if it finds any difference in column values the same need to be printed in the resulting file.
Below is the exact requirement.
File 1:
id code name place
123 abc Tom phoenix
345 xyz Harry seattle
675 kyt Romil newyork
File 2:
id code name place
123 pkt Rosy phoenix
345 xyz Harry seattle
421 uty Romil Sanjose
Expected resulting file:
id_1 id_2 code_1 code_2 name_1 name_2 place_1 place_2
= = abc pkt Tom Rosy = =
= = = = = = = =
675 421 kyt uty = = Newyork Sanjose
Columns are tab delimited.
Upvotes: 0
Views: 712
Reputation: 2384
This is rather crudely coded, but shows a way to use awk
to emit what you want, and can handle files of identical "schema" - not just the particular 4-field files you give as tests.
This approach uses pr
to do a simple merge of the files: the same line of each input file is concatenated to present one line to the awk
script.
The awk
script assumes clean input, and uses the fact that if a variable n
has the value 2, the value of $n
in the script is the the same as $2
. So, the script walks though pairs of fields using the i
and j
variables. For your test input, fields 1 and 5, then 2 and 6, etc., are processed.
Only very limited testing of input is performed: mainly, that the implied schema of the two input files (the names of columns/fields) is the same.
#!/bin/sh
[ $# -eq 2 ] || { echo "Usage: ${0##*/} <file1> <file2>" 1>&2; exit 1; }
[ -r "$1" -a -r "$2" ] || { echo "$1 or $2: cannot read" 1>&2; exit 1; }
set -e
pr -s -t -m "$@" | \
awk '
{
offset = int(NF/2)
tab = ""
for (i = 1; i <= offset; i++) {
j = i + offset
if (NR == 1) {
if ($i != $j) {
printf "\nColumn name mismatch (%s/%s)\n", $i, $j > "/dev/stderr"
exit
}
printf "%s%s_1\t%s_2", tab, $i, $j
} else if ($i == $j) {
printf "%s=\t=", tab
} else {
printf "%s%s\t%s", tab, $i, $j
}
tab = "\t"
}
printf "\n"
}
'
Tested on Linux: GNU Awk 4.1.0 and pr (GNU coreutils) 8.21.
Upvotes: 1