Reputation: 251
I am searching/ matching a list of terms from my source file sourcefile.txt
with those in my target file target.bed
. I want to print out the grep'd terms with their corresponding distance values to a separate output file.
The source file looks like this:
SMOX
NCOA3
EHF
The target file looks like this:
Chromosome PeakStart PeakEnd Distance GeneStart GeneEnd ClosestTSS_ID Symbol Strand
chr20 4100204 4100378 -29134 4129425 4168394 SMOX null +
chr20 6234586 46234754 -21075 46255745 46257534 NCOA3 null +
chr11 34622044 34622238 -20498 34642639 34668098 EHF >null +
The output file to contain the grep'd text (ClosestTSS_ID and Distance)
SMOX -29134
NCOA -21075
EHF -20498
I have tried this script:
exec < sourcefile.txt
while read line
do
genes=$(echo $line| awk '{print $1}')
grep -w "genes" targetfile.bed | awk '{print $4,$7}' >> outputfile.txt
done`
but it doesn't work for my different source files; I have a number of different source files I want to contain in the same loop but the script only works for the first. I have used the same script but with different filenames.
I have tried this too:
rm sourcefile_temp.txt
touch sourcefile_temp.txt
awk 'NR>1{print $1}' sourcefile.txt > sourcefile_temp.txt
exec < sourcefile_temp.txt
while read line
do
set $line
sourcefilevar=`grep $1 targetfile.bed| cut -f4| cut -f7`
echo $line $tssmoq2 >> output.txt
done`
This one gives me a really strange output.
Any suggestions/ corrections/ better ways to do this would be hugely appreciated.
Upvotes: 2
Views: 443
Reputation: 85785
This awk
script will do the job:
$ awk 'FNR==NR{a[$1];next}FNR>1&&($7 in a){print $7,$4}' source target
SMOX -29134
NCOA3 -21075
EHF -20498
Upvotes: 2