anmonu
anmonu

Reputation: 169

Printing Columns of two files if content of first column in both file matches in Unix

I have two files as below

file1
name|address office
AK|Victoria Street
BK|Admond Street
DK|Business Street

file2
name|address home
AK|Nilofer Villa
ck|Bluewaters
bk|Homingo Apartment

the command or line of code should compare the first column of the two files and merge the columns as name|address office|address home, and replace the NA wherever is not matched,content of the files can be huge. full output should be as below

file3
name|address office|address home
AK|Victoria Street |Nilofer Villa
BK|Admond Street|Homingo Apartment
DK|Business Street|NA
CK|NA|Bluewaters

here is what I have tried so far:

awk -F '|' 'NR==FNR{c[$1]++;next};c[$1] > 0' file1 file2

but above lines of code are not merging, just producing the output as difference based on column name. that too case sensitive
name|address home AK|Nilofer Villa

Please Help, have checked few questions also, but not solving my purpose.

Upvotes: 1

Views: 74

Answers (2)

Ed Morton
Ed Morton

Reputation: 203522

$ cat tst.awk
BEGIN { FS=OFS="|" }
{
    name = (FNR>1 ? toupper($1) : $1)
    if (!seen[name]++) {
        names[++numNames] = name
        vals[name,1] = vals[name,2] = "NA"
    }
    vals[name,ARGIND] = $2
}
END {
    for (nameNr=1; nameNr<=numNames; nameNr++) {
        name = names[nameNr]
        print name, vals[name,1], vals[name,2]
    }
}

$ awk -f tst.awk file1 file2
name|address office|address home
AK|Victoria Street|Nilofer Villa
BK|Admond Street|Homingo Apartment
DK|Business Street|NA
CK|NA|Bluewaters

The above uses GNU awk for ARGIND, with other awks just add FNR==1{ARGIND++} at the start of the script.

Upvotes: 2

bishop
bishop

Reputation: 39414

You can do this with the join command:

$ join -a 1 -a 2 -e NA -o '0,1.2,2.2' -t '|' -i <(sort f1) <(sort f2)
AK|Victoria Street|Nilofer Villa
BK|Admond Street|Homingo Apartment
ck|NA|Bluewaters
DK|Business Street|NA
name|address office|address home

Where:

  • -a 1 and -a 2 include the unjoined lines from either file.
  • -e and -o work together to show the "NA" field. The man page doesn't mention this, but to use -e you must specify -o. We simply show the fields in this order: join column, second column from first file, second column from second file.
  • -t sets the delimiter

Of course we must also sort the files prior to using join (it's requisite), so we use process substitution. If your shell doesn't have that, you may use temporary files.

Upvotes: 4

Related Questions