Reputation: 5719
I am kind of confused with this bit of code. I have my testfile.txt
Sclsc1_3349_SS1G_09805T0 TTGCGATCTATGCCGACGTTCCA
Sclsc1_8695_SS1G_14118T0 ATGGTTTCGGC
Sclsc1_12154_SS1G_05183T0 ATGGTTTCGGC
Sclsc1_317_SS1G_00317T0 ATGGTTTCGGC
Sclsc1_10094_SS1G_03122T0 ATGGTTTCGGC
I want to convert this file to this format (fasta
) below:
>Sclsc1_3349_SS1G_09805T0
TTGCGATCTATGCCGACGTTCCA
>Sclsc1_8695_SS1G_14118T0
ATGGTTTCGGC
>Sclsc1_12154_SS1G_05183T0
ATGGTTTCGGC
>Sclsc1_317_SS1G_00317T0
ATGGTTTCGGC
>Sclsc1_10094_SS1G_03122T0
ATGGTTTCGGC
Here is my python code (run it like: python mycode.py testfile.txt outputfile.txt
, but it does not output the result as I wanted. Can someone please help me correct this code? Thanks!
import sys
#File input
fileInput = open(sys.argv[1], "r")
#File output
fileOutput = open(sys.argv[2], "w")
#Seq count
count = 1 ;
#Loop through each line in the input file
print "Converting to FASTA..."
for strLine in fileInput:
#Strip the endline character from each input line
strLine = strLine.rstrip("\n")
#Output the header
fileOutput.write("> " + str(count) + "\n")
fileOutput.write(strLine + "\n")
count = count + 1
print ("Done.")
#Close the input and output file
fileInput.close()
fileOutput.close()
Upvotes: 0
Views: 4063
Reputation: 51
import sys
inp = open('Dataset.csv', "r")
outp = open('Book1.txt', "w")
print ("Convertion")
for a in inp:
a = a.rstrip("\n")
outp.write("> " + strLine[0:6] + "\n")
outp.write(strLine[11:-4] + "\n")
print ("Done")
inp.close()
outp.close()
Upvotes: 0
Reputation: 92854
As you are on Linux OS, here is short and fast awk one-liner:
awk '{ printf ">%s\n%s\n",$1,$2 }' testfile.txt > outputfile.txt
The outputfile.txt
contents:
>Sclsc1_3349_SS1G_09805T0
TTGCGATCTATGCCGACGTTCCA
>Sclsc1_8695_SS1G_14118T0
ATGGTTTCGGC
>Sclsc1_12154_SS1G_05183T0
ATGGTTTCGGC
>Sclsc1_317_SS1G_00317T0
ATGGTTTCGGC
>Sclsc1_10094_SS1G_03122T0
ATGGTTTCGGC
Upvotes: 2