user7249622
user7249622

Reputation:

how to get the common rows according to the first column in awk

I have two ',' separated files as follow:

file1:

A,inf
B,inf
C,0.135802
D,72.6111
E,42.1613

file2:

A,inf
B,inf
C,0.313559
D,189.5
E,38.6735

I want to compare 2 files ans get the common rows based on the 1st column. So, for the mentioned files the out put would look like this:

A,inf,inf
B,inf,inf
C,0.135802,0.313559
D,72.6111,189.5
E,42.1613,38.6735

I am trying to do that in awk and tried this:

awk ' NR == FNR {val[$1]=$2; next} $1 in val {print $1, val[$1], $2}' file1 file2 

this code returns this results:

A,inf
B,inf
C,0.135802
D,72.6111
E,42.1613

which is not what I want. do you know how I can improve it?

Upvotes: 2

Views: 56

Answers (2)

hek2mgl
hek2mgl

Reputation: 157992

Your awk code basically works, you are just missing to tell awk to use , as the field delimiter. You can do it by adding BEGIN{FS=OFS=","} to the beginning of the script.

But having that the files are sorted like in the examples in your question, you can simply use the join command:

join -t, file1 file2

This will join the files based on the first column. -t, tells join that columns are separated by commas.

If the files are not sorted, you can sort them on the fly like this:

join -t, <(sort file1) <(sort file2)

Upvotes: 1

James Brown
James Brown

Reputation: 37404

$ awk 'BEGIN{FS=OFS=","}NR==FNR{a[$1]=$0;next}$1 in a{print a[$1],$2}' file1 file2
A,inf,inf
B,inf,inf
C,0.135802,0.313559
D,72.6111,189.5
E,42.1613,38.6735

Explained:

$ awk '
BEGIN {FS=OFS="," }  # set separators
NR==FNR {            # first file
    a[$1]=$0         # hash to a, $1 as index
    next             # next record
}
$1 in a {            # second file, if $1 in a
    print a[$1],$2   # print indexed record from a with $2
}' file1 file2

Upvotes: 1

Related Questions