user1879573
user1879573

Reputation: 251

Searching for text

I'm trying to write a shell script that searches for text within a file and prints out the text and associated information to a separate file.

From this file containing list of gene IDs:

DDIT3   ENSG00000175197
DNMT1   ENSG00000129757
DYRK1B  ENSG00000105204

I want to search for these gene IDs (ENSG*), their RPKM1 and RPKM2 values in a gtf file:

chr16   gencodeV7       gene    88772891        88781784        0.126744        +       .       gene_id "ENSG00000174177.7"; transcript_ids "ENST00000453996.1,ENST00000312060.4,ENST00000378384.3,"; RPKM1 "1.40735"; RPKM2 "1.61345"; iIDR "0.003";
chr11   gencodeV7       gene    55850277        55851215        0.000000        +       .       gene_id "ENSG00000225538.1"; transcript_ids "ENST00000425977.1,"; RPKM1 "0"; RPKM2 "0"; iIDR "NA";

and print/ write it to a separate output file

Gene_ID         RPKM1   RPKM2
ENSG00000108270 7.81399 8.149
ENSG00000101126 12.0082 8.55263

I've done it on the command line using for each ID using:

grep -w "ENSGno" rnaseq.gtf| awk '{print $10,$13,$14,$15,$16}' > output.file

but when it comes to writing the shell script, I've tried various combinations of for, while, read, do and changing the variables but without success. Any ideas would be great!

Upvotes: 0

Views: 1510

Answers (1)

fedorqui
fedorqui

Reputation: 289505

You can do something like:

while read line
do
  var=$(echo $line | awk '{print $2}')
  grep -w "$var" rnaseq.gtf| awk '{print $10,$13,$14,$15,$16}' >> output.file
done < geneIDs.file

Upvotes: 1

Related Questions