Reputation: 1
I have use awk、grep use pipe get the contents we called A(part of contents in my file):
LOC_Os04g47290
LOC_Os04g53190,LOC_Os04g53195
LOC_Os09g20260
I want to use the contents to grep or get matched contents and others in B(part of contents in my file):
_O2 int381,int382,int384,int385,int386,int387,int388,int391,int392,int393,int394,int395,int396,int397,int398,int399,int400,int401,int402,int403,int404,int408,int409,int410,int412,int413,int414:chr4:31119012..31944575 chr4:31669055..31674598 LOC_Os04g53190,LOC_Os04g53195 CPuORF12,expressed - conserved peptide uORF-containing transcript, expressed ; protein ; PF01593 Amino_oxidase 0.0539946
when I use
cat a|awk -F"," '{for (i=1;i<=NF;i++)print $i}'|grep -f - B|grep PF|awk '{print $4"\t"$(NF-2)}'
i will get
LOC_Os04g53190,LOC_Os04g53195 PF01593
But, i want to print
LOC_Os04g53190 PF01593
LOC_Os04g53195 PF01593
Upvotes: 0
Views: 171
Reputation: 8164
Improving awk
last statement
cat a |
awk -F"," '{for (i=1;i<=NF;i++)print $i}' |
grep -f - B |
grep PF |
awk '{n=split($4,v,","); for(i=1; i<=n; ++i) print v[i]"\t"$(NF-2)}'
you get,
LOC_Os04g53190 PF01593
LOC_Os04g53195 PF01593
bonus: awk only solution
awk '
NR==FNR{d[$1]; next}
$(NF-2) ~ /^PF/{
n=split($4,v,",")
for(i=1; i<=n; ++i) if(v[i] in d) print v[i]"\t"$(NF-2)
}
' RS="[\n,]" a RS="\n" B
Upvotes: 1
Reputation: 10592
Sample file
sharad$ cat sample_file
foo
bar
sharad$
Capture matching contents into a variable
sharad$ match=$(cat sample_file | grep foo)
Capture non-matching contents into another variable
sharad$ non_match=$(cat sample_file | grep -v foo)
sharad$
Verify the contents of matching and non-matching variables (grep -v)
sharad$ echo $match
foo
sharad$ echo $non_match
bar
sharad$
From man grep
-v, --invert-match Selected lines are those not matching any of the specified patterns.
Upvotes: 0