Find matches within columns in two files

I have to files looking like this:

File 1

mir1    CAT1;DEM20;SCD;LIART;COLECC2
mir2    ELAM2;SIRT1;FROMO;PER1;PER2

File 2

mir1    DEM20;LIART;ACACA;FOXO1;DIPEM
mir2    ELAM2;SIRT1;FROMO;PER1;PER2

I want to compare both files in column 2, to count the matches within the names, that are separated by ";", the number of names in column 2 can vary, so this is just an example.

The desired output should be something like a count number of matches, say:

File 3

mir1    2
mir2    5

As there are 2 matches for first row between both files, and 5 matches for the second row.

I have tried formating each name as a colum with awk, but ended up with many columns and comparisons at once.

Any help?

Thanks

Upvotes: 0

Views: 32

Answers (1)

karakfa
karakfa

Reputation: 67467

$ awk -v s=";" 'NR==FNR {a[$1]=s $2 s; next} 
                        {c=0; n=split($2,b,s); 
                         for(i=1;i<=n;i++) c+=(a[$1] ~ s b[i] s); 
                         print $1,c}' file1 file2

mir1 2
mir2 5

NB this uses regex matching instead of string equality, should work fine as long as you don't have regex special chars in the values.

Upvotes: 1

Related Questions