Reputation: 75
I have two files
FileA.txt
ID
479432_Sros_4274
330214_NIDE2792
517722_CJLT1_010100003977
257310_BB0482
...
FileB.txt (The ** is only to help you to identify the matches)
members category
6085.XP_002168109,**479432_Sros_4274**,4956.XP_002495993.1,457425.SSHG_03214,51511.ENSCSAVP000 P
7159.AAEL006372-PA,**257310_BB0482** J
**517722_CJLT1_010100003977**,701176.VIBRN418_17773,9785.ENSLAFP00000010769,28377.ENSACAP00000014901,4081.Solyc03g120250.2.1,3847.GLYMA18G02240.1 U
500485.XP_002561312.1,1042876.PPS_0730,222929.XP_003071446.1,**330214_NIDE2792** S
...
Expected output
Output.txt
ID category
479432_Sros_4274 P
330214_NIDE2792 S
517722_CJLT1_010100003977 U
257310_BB0482 J
...
I have tried some code in awk and R based on answers to other questions, but I could not get the desired output.
Upvotes: 1
Views: 609
Reputation: 133438
Could you please try following.
awk '
BEGIN{
print "ID category"
}
FNR==NR{
a[$0]
next
}
{
for(i in a){
if(match($0,i)){
print i,$NF
}
}
}
' Input_filea Input_fileb
Explanation: Adding explanation for above code.
awk ' ##Starting awk program here.
BEGIN{ ##Starting BEGIN section from here.
print "ID category" ##Printing string ID, category here.
} ##Closing BLOCK for BEGIN section.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when 1st Input_file is being read.
a[$0] ##Creating an array named a whose index is $).
next ##next will skip all further statements from here.
}
{
for(i in a){ ##Traversing through array a with for loop.
if(match($0,i)){ ##Checking condition if match is having a proper regex matched then do following.
print i,$NF ##Printing variable i and $NF of current line.
}
}
}
' Input_filea Input_fileb ##Mentioning Input_file names here.
Upvotes: 3
Reputation: 37394
This is one way of doing it:
$ awk '
NR==FNR { # process file1
if(FNR==1) # print header, no newline
printf $1
a[$1] # hash data
next
}
{ # process file2
if(FNR==1) # print the other half of the header
print OFS $2
for(i in a) # loop all items in hash
if($1 ~ i) # check for partial match
print i,$2 # if found, output
}' file1 file2 # mind the order
Output (in file2 order, notice the partial match of in the last line of output, left as a warning):
ID category
479432_Sros_4274 P
257310_BB0482 J
517722_CJLT1_010100003977 U
330214_NIDE2792 S
ID S
Upvotes: 4