Angelo
Angelo

Reputation: 5059

matching columns in two files and modifying the resulting file

I have two files one with 3 columns and one with 4 columns. They look like this

file 1:

air 0.1 0.2
soil 0.9 0.7
water 0.4 0.6

file 2:

temp1 0.1 0.2 air
temp2 0.5 0.6 .
temp3 0.6 0.3 water

If the values of column 4 in file 2 match the value of column1 in file the output should be like:

temp1 0.1 0.2 air 0.1

that is all values of file 2 and the first two columns of file 1. in case of mismatch the whole row of file 2 should be printed as it is:

So the final output be

temp1 0.1 0.2 air 0.1
temp2 0.5 0.6 .
temp3 0.6 0.3 water 0.4

if it was a normal matching of the values from two files something like this could be used:

for i in `cat file 1`; do awk '{if($4=="'$i'") print $0_}'<file2 >>output; done

however, the code requires more than this.

could anyone help me in fixing this.

Thank you

Upvotes: 1

Views: 56

Answers (3)

KamilCuk
KamilCuk

Reputation: 141020

This really looks like a classical use of join. The join utility is used to join the files on specific fields (the files have to be sorted). This answer doesn't use awk, dunno if it's a problem.

cat <<EOF >file1
air 0.1 0.2
soil 0.9 0.7
water 0.4 0.6
EOF
cat <<EOF >file2
temp1 0.1 0.2 air
temp2 0.5 0.6 .
temp3 0.6 0.3 water
EOF

# separator is space
# join on the first field from first file
# join on the firth field from the second file
# in case the lines are not matched, print the line from second file
# output - first output 4 fields from file 2 and second field from file 2
#          it is the same as 3 fields from file 2 and 2 fields from file 1
join -t' ' -11 -24 -a2 -o 2.1,2.2,2.3,2.4,1.2 file1 file2

will output:

temp1 0.1 0.2 air 0.1
temp2 0.5 0.6 .
temp3 0.6 0.3 water 0.4

tested on repl.

In case your input files are not sorted, you need to sort them beforehand on specific fields:

 join -t' ' -11 -24 -a2 -o 2.1,2.2,2.3,2.4,1.2 <(<file1 sort -t' ' -k1) <(<file2 sort -t' ' -k4)

In case your input files are not sorted and you need to preserve the sorting order from the file 2, number the lines in file 2, join them, sort the output using the line numbers from file2 and remove those line numbers:

 join -t' ' -11 -25 -a2 -o 2.1,2.2,2.3,2.4,2.5,1.2 <(<file1 sort -t' ' -k1) <(<file2 nl -w1 -s' ' | sort -t' ' -k5) | sort -t' ' -k1 | cut -d' ' -f2-

Upvotes: 4

Ed Morton
Ed Morton

Reputation: 203512

$ awk 'NR==FNR{a[$1]=$2; next} {print $0 ($4 in a ? OFS a[$4] : "")}' file1 file2
temp1 0.1 0.2 air 0.1
temp2 0.5 0.6 .
temp3 0.6 0.3 water 0.4

Upvotes: 3

choroba
choroba

Reputation: 241868

Perl to the rescue!

#!/usr/bin/perl
use warnings;
use strict;

my %F1;
open my $f1, '<', shift or die $!;
while (<$f1>) {
    my ($id, $value) = split;
    warn "Duplicate entry for $id.\n" if exists $F1{$id};
    $F1{$id} = $value;
}

open my $f2, '<', shift or die $!;
while (<$f2>) {
    my ($val0, $val1, $val2, $id) = split;
    print join ' ', $val0, $val1, $val2, $id,
        $F1{$id} x exists $F1{$id}, "\n";
}

Save as match-cols, run as perl match-cols file1 file2.

It stores the values from file 1 in a hash map, then reads file 2 line by line and outputs either the line itself if the id is not found in the hash map, or the line plus the information stored in the hash map.

Upvotes: 2

Related Questions