Reputation: 3344
I would like to merge file1 4th column with file2 1st column with awk and I would like to print 2nd column from file $1. If more than one match (could be more than 100), print it separated by comma.
FILE1:
alo descrip 1 PAPA
alo descrip 2 LOPA
alo descrip 3 REP
alo descrip 4 SEPO
dlo sapro 31 REP
dlo sapro 35 PAPA
FILE2:
PAPA klob trop
PAPA kopo topo
HOJ sasa laso
REP deso rez
SEPO raz ghul
REP kok loko
OUTPUT:
PAPA klob trop descrip,sapro
PAPA kopo topo descrip,sapro
HOJ sasa laso NA
REP deso rez descrip,sapro
SEPO raz ghul descrip
REP kok loko descrip,sapro
I tried:
awk -v FILE_A="FILE1" -v OFS="\t" 'BEGIN { while ( ( getline < FILE_A ) > 0 ) { VAL = $0 ; sub( /^[^ ]+ /, "", VAL ) ; DICT[ $1 ] = VAL } } { print $0, DICT[ $4 ] }' FILE2
but it doesn't work.
Upvotes: 1
Views: 248
Reputation: 133438
Could you please try following.
awk '
FNR==NR{
a[$NF]=(a[$NF]?a[$NF] ",":"")$2
next
}
{
printf("%s %s\n",$0,($1 in a)?a[$1]:"NA")
}
' Input_file1 Input_file2
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program fro here.
FNR==NR{ ##Checking condition FNR==NR whioh will be TRUE when Input_file1 is being read.
a[$NF]=(a[$NF]?a[$NF] ",":"")$2 ##Creating arra a with index $NF, its value is keep appending to its own value with $2 of current line.
next ##next will skip all further lines from here.
}
{
printf("%s %s\n",$0,($1 in a)?a[$1]:"NA") ##Printing current line then either value of array or NA depending upon if condition satisfies.
}
' Input_file1 Input_file2 ##Mentioning Input_file names here.
Upvotes: 3
Reputation: 37394
In essence the question was how to store data to an array when there are duplicated keys. @RavinderSingh13 demonstrated gloriously how to append data to indexed array elements. Another way is to use multidimensional arrays. Here is a sample how to use them in GNU awk:
$ gawk ' # using GNU awk
NR==FNR { # process first file
a[$4][++c[$4]]=$2 # 2d array
next
}
{ # process second file
printf "%s%s",$0,OFS # print the record
if($1 in a) # if key is found in array
for(i=1;i<=c[$1];i++) # process related dimension
printf "%s%s",a[$1][i],(i==c[$1]?ORS:",") # and output elements
else # if key was not in array
print "NA" # output NA
}' file1 file2
Upvotes: 3