Reputation: 3022
Trying to use AWK to match the contents of each line in file
with $2
in list
. Both files are tab-delimited and there may be a space or special character in the name being matched in list
, for example in file
the name is BRCA1
but in list
the name is BRCA 1
or in file
name is BCR
but in list
the name is BCR/ABL
.
If there is a match and $4
of list
has full gene sequence
in it, then $2 and $1
are printed separated by a tab. If there is no match found then the name that was not matched and 14
are printed separated by a tab. The awk below does execute, but no output results. Thank you :).
file
BRCA1
BCR
SCN1A
fbn1
list
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
awk
awk -F'\t' -v OFS="\t" 'FNR==NR{A[$1]=$0;next} ($2 in A){if($4=="full gene sequence"){print A[$2],$1}} ELSE {print A[$2],"14"}' file list
desired output
BRCA1 811
BCR 71
SCN1A 14
fbn1 85
edit
List code gene gene name methodology
85 fbn1 Fibrillin full gene sequencing
95 FBN1 fibrillin del/dup
result
85 fbn1 Fibrillin full gene sequencing
since only this line has full gene sequencing
in it, only this is printed.
Upvotes: 0
Views: 143
Reputation: 8174
You can try,
awk 'BEGIN{FS=OFS="\t"}
FNR==NR{
if(NR>1){
gsub(" ","",$2) #removing white space
n=split($2,v,"/")
d[v[1]] = $1 #from split, first element as key
}
next
}{print $1, ($1 in d?d[$1]:14)}' list file
you get,
BRCA1 811 BCR 71 SCN1A 14
Upvotes: 1
Reputation: 16997
awk 'FNR==NR{
a[$2]=$1;
next
}
{
for(i in a){
if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
}
print $1,14
}' list file
Input
$ cat list
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
$ cat file
BRCA1
BCR
SCN1A
Output
$ awk 'FNR==NR{
a[$2]=$1;
next
}
{
for(i in a){
if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
}
print $1,14
}' list file
BRCA1 811
BCR 71
SCN1A 14
Upvotes: 1