Reputation: 344
The problem
I'm trying to write a script which converts a file type into another using SeqIO, ideally on the command-line I'd write:
python3 converter.py *.ab1 *.fas
But this doesn't work. However, when I use the following, it works fine:
python3 converter.py *.ab1 workingexample.fas
My current code
This my code at the moment.
import sys
from Bio import SeqIO
file = sys.argv[1]
outputfile = sys.argv[2]
with open(file, "rb") as input_handle:
with open(outputfile, "w") as output_handle:
sequences = SeqIO.parse(input_handle, "abi")
count = SeqIO.write(sequences, output_handle, "fasta")
print("Converted %i records" % count)
Desired output:
A fix for
python3 converter.py *.ab1 *.fas
Which means I can run this on multiple files in the same directory with the same extension, and then output a file with the same name but the new converted extension.
(I'm not sure whether to tag this as a bash question or a python one, so I have tagged it as both, please correct me if I'm wrong)
Upvotes: 0
Views: 428
Reputation: 74615
If I understand correctly, I think the issue is that *.ab1
matches a file (at least one), whereas *.fas
doesn't match anything. This means that your program gets called like:
python3 converter.py first.ab1 second.ab1 third.ab1 *.fas
and *.fas
is passed, unexpanded. Clearly from this example, you cannot rely on *.fas
being the second argument - it is the last argument.
I would argue that the shell is helping you here by expanding the glob, and that there's no need to prevent it from doing its work by e.g. enclosing your arguments in quotes.
I would suggest that you call your script like:
python3 converter.py *.ab1 fas
Then change the code to:
import sys
import os.path
from Bio import SeqIO
files = sys.argv[1:-1]
output_ext = sys.argv[-1]
def get_output_filename(input_filename, output_ext):
root, ext = os.path.splitext(input_filename)
return "{}.{}".format(root, output_ext)
for in_file in files:
out_file = get_output_filename(in_file, output_ext)
with open(in_file, "rb") as input_handle, open(out_file, "w") as output_handle:
sequences = SeqIO.parse(input_handle, "abi")
count = SeqIO.write(sequences, output_handle, "fasta")
print("Converted %i records" % count)
Since you're converting to "fasta" format no matter what the last argument is, then I guess you could just rename the script to convert_abi_to_fasta.py
and drop the last argument.
Upvotes: 1
Reputation: 3399
Does something like this work?
import sys
import glob
import os
from Bio import SeqIO
file = sys.argv[1]
output_extension = sys.argv[2]
for f in glob.glob(file):
filename, file_extension = os.path.splitext(f)
outputfile = f.replace(file_extension, output_extension)
with open(f, "rb") as input_handle:
with open(outputfile, "w") as output_handle:
sequences = SeqIO.parse(input_handle, "abi")
count = SeqIO.write(sequences, output_handle, "fasta")
print("Converted %i records" % count)
You can call it like this:
python converter.py '*.ab1' .fas
Upvotes: 3