How to convert text file with two columns to fasta format

Question

I am kind of confused with this bit of code. I have my testfile.txt

Sclsc1_3349_SS1G_09805T0        TTGCGATCTATGCCGACGTTCCA
Sclsc1_8695_SS1G_14118T0        ATGGTTTCGGC
Sclsc1_12154_SS1G_05183T0       ATGGTTTCGGC
Sclsc1_317_SS1G_00317T0         ATGGTTTCGGC
Sclsc1_10094_SS1G_03122T0       ATGGTTTCGGC

I want to convert this file to this format (fasta) below:

>Sclsc1_3349_SS1G_09805T0
TTGCGATCTATGCCGACGTTCCA
>Sclsc1_8695_SS1G_14118T0
ATGGTTTCGGC
>Sclsc1_12154_SS1G_05183T0
ATGGTTTCGGC
>Sclsc1_317_SS1G_00317T0
ATGGTTTCGGC
>Sclsc1_10094_SS1G_03122T0
ATGGTTTCGGC

Here is my python code (run it like: python mycode.py testfile.txt outputfile.txt, but it does not output the result as I wanted. Can someone please help me correct this code? Thanks!

import sys

#File input
fileInput = open(sys.argv[1], "r")

#File output
fileOutput = open(sys.argv[2], "w")

#Seq count
count = 1 ;

#Loop through each line in the input file
print "Converting to FASTA..."
for strLine in fileInput:

    #Strip the endline character from each input line
    strLine = strLine.rstrip("
")

    #Output the header
    fileOutput.write("> " + str(count) + "
")
    fileOutput.write(strLine + "
")

    count = count + 1
print ("Done.")

#Close the input and output file
fileInput.close()
fileOutput.close()

RomanPerekhrest · Accepted Answer

As you are on Linux OS, here is short and fast awk one-liner:

awk '{ printf ">%s
%s
",$1,$2 }' testfile.txt > outputfile.txt

The outputfile.txt contents:

>Sclsc1_3349_SS1G_09805T0
TTGCGATCTATGCCGACGTTCCA
>Sclsc1_8695_SS1G_14118T0
ATGGTTTCGGC
>Sclsc1_12154_SS1G_05183T0
ATGGTTTCGGC
>Sclsc1_317_SS1G_00317T0
ATGGTTTCGGC
>Sclsc1_10094_SS1G_03122T0
ATGGTTTCGGC

How to convert text file with two columns to fasta format

Answers (2)

Related Questions