Reputation: 687
I have to files looking like this:
File 1
mir1 CAT1;DEM20;SCD;LIART;COLECC2
mir2 ELAM2;SIRT1;FROMO;PER1;PER2
File 2
mir1 DEM20;LIART;ACACA;FOXO1;DIPEM
mir2 ELAM2;SIRT1;FROMO;PER1;PER2
I want to compare both files in column 2, to count the matches within the names, that are separated by ";", the number of names in column 2 can vary, so this is just an example.
The desired output should be something like a count number of matches, say:
File 3
mir1 2
mir2 5
As there are 2 matches for first row between both files, and 5 matches for the second row.
I have tried formating each name as a colum with awk, but ended up with many columns and comparisons at once.
Any help?
Thanks
Upvotes: 0
Views: 32
Reputation: 67467
$ awk -v s=";" 'NR==FNR {a[$1]=s $2 s; next}
{c=0; n=split($2,b,s);
for(i=1;i<=n;i++) c+=(a[$1] ~ s b[i] s);
print $1,c}' file1 file2
mir1 2
mir2 5
NB this uses regex matching instead of string equality, should work fine as long as you don't have regex special chars in the values.
Upvotes: 1