Biomage
Biomage

Reputation: 344

Using a wildcard in the command line as sys.argv[2]

The problem

I'm trying to write a script which converts a file type into another using SeqIO, ideally on the command-line I'd write:

python3 converter.py *.ab1 *.fas

But this doesn't work. However, when I use the following, it works fine:

python3 converter.py *.ab1 workingexample.fas

My current code

This my code at the moment.

import sys
from Bio import SeqIO

file = sys.argv[1]
outputfile = sys.argv[2]

with open(file, "rb") as input_handle:
    with open(outputfile, "w") as output_handle:
        sequences = SeqIO.parse(input_handle, "abi")
        count = SeqIO.write(sequences, output_handle, "fasta")

print("Converted %i records" % count)

Desired output:

A fix for

python3 converter.py *.ab1 *.fas

Which means I can run this on multiple files in the same directory with the same extension, and then output a file with the same name but the new converted extension.

(I'm not sure whether to tag this as a bash question or a python one, so I have tagged it as both, please correct me if I'm wrong)

Upvotes: 0

Views: 428

Answers (2)

Tom Fenech
Tom Fenech

Reputation: 74615

If I understand correctly, I think the issue is that *.ab1 matches a file (at least one), whereas *.fas doesn't match anything. This means that your program gets called like:

python3 converter.py first.ab1 second.ab1 third.ab1 *.fas

and *.fas is passed, unexpanded. Clearly from this example, you cannot rely on *.fas being the second argument - it is the last argument.

I would argue that the shell is helping you here by expanding the glob, and that there's no need to prevent it from doing its work by e.g. enclosing your arguments in quotes.

I would suggest that you call your script like:

python3 converter.py *.ab1 fas

Then change the code to:

import sys
import os.path
from Bio import SeqIO

files = sys.argv[1:-1]
output_ext = sys.argv[-1]

def get_output_filename(input_filename, output_ext):
    root, ext = os.path.splitext(input_filename)
    return "{}.{}".format(root, output_ext)

for in_file in files:
    out_file = get_output_filename(in_file, output_ext)
    with open(in_file, "rb") as input_handle, open(out_file, "w") as output_handle:
        sequences = SeqIO.parse(input_handle, "abi")
        count = SeqIO.write(sequences, output_handle, "fasta")
    print("Converted %i records" % count)

Since you're converting to "fasta" format no matter what the last argument is, then I guess you could just rename the script to convert_abi_to_fasta.py and drop the last argument.

Upvotes: 1

Ashish Acharya
Ashish Acharya

Reputation: 3399

Does something like this work?

import sys
import glob
import os
from Bio import SeqIO

file = sys.argv[1]
output_extension = sys.argv[2]

for f in glob.glob(file):
    filename, file_extension = os.path.splitext(f)
    outputfile = f.replace(file_extension, output_extension)

    with open(f, "rb") as input_handle:
        with open(outputfile, "w") as output_handle:
            sequences = SeqIO.parse(input_handle, "abi")
            count = SeqIO.write(sequences, output_handle, "fasta")

    print("Converted %i records" % count)

You can call it like this:

python converter.py '*.ab1' .fas

Upvotes: 3

Related Questions