Reputation: 75

Converting from tsv to fasta

I have a bunch of TSV files in my folder and for everyone one of them I would like to get a fasta file where the header after the sign '>' is the name of the file. My TSV file has 5 columns without header:

Thus:

inputfile called: "A.coseq.table_headless.tsv" HIV1B-pol-seed 15 MAX 1959 GTAACAGACTCACAATATGCATTAGGAATCATTCAAGC output file called "A.fasta"

>A_MAX

GTAACAGACTCACAATATGCATTAGGAATCATTCAAGC

I want to run the script simultaneously in bash for all the files and I have this script who does not work because in awk print statement I have a curly brace:

for sample in `ls *coseq.table_headless.tsv`
do
base1=$(basename $sample "coseq.table_headless.tsv")
awk '{print ">"${base1}"_"$3"\n"$5}' ${base1}coseq.table_headless.tsv > ${base1}fasta

done

Any idea how to correct this code? Thank you very much

Upvotes: 0

Answers (3)

RomanPerekhrest

Reputation: 92854

Another awk solution:

awk '{ pfx=substr(FILENAME,1,index(FILENAME,".")-1); 
       printf(">%s_%s\n%s\n",pfx,$3,$5) > pfx".fasta" }' *coseq.table_headless.tsv

pfx contains the first part of filename (till the 1st .)

Upvotes: 0

Ed Morton

Reputation: 203684

The other solutions posted so far have a few issues:

not closing the files as they're written will produce "too many open files" errors unless you use GNU awk,
calculating the output file name every time a line is read rather than once when the input file is opened is inefficient, and
using parenthesized expression on the right side of output redirection is undefined behavior and so will only work in some awks (including GNU awk).

This will work robustly and efficiently in all awks:

awk '
    FNR==1 { close(out); f=FILENAME; sub(/\..*/,"",f); pfx=">"f"_"; out=f".fasta" }
    { print pfx $3 ORS $5 > out }
' *coseq.table_headless.tsv

Upvotes: 0

karakfa

Reputation: 67507

if the basename is the part until the first ".", you can get rid of the loop as well.

 awk '{split(FILENAME,base,"."); 
       print ">" base[1] "_" $3 "\n" $5 > base[1]".fasta"}' *coseq.table_headless.tsv

Upvotes: 2

Converting from tsv to fasta

Answers (3)

Related Questions