Alex
Alex

Reputation: 355

How to run a python script as a loop for different files?

I have a python script as follow:

#!/usr/bin/python
from Bio import SeqIO

fasta_file = "input.fa" # Input fasta file
wanted_file = "A_ids.txt" # Input interesting sequence IDs, one per line
result_file = "A.fasta" # Output fasta file

wanted = set()
with open(wanted_file) as f:
    for line in f:
        line = line.strip()
        if line != "":
            wanted.add(line)

fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
    for seq in fasta_sequences:
        if seq.id in wanted:
            SeqIO.write([seq], f, "fasta")

I would like to run the script above for the same input file, but for 40 different wanted_files - with different names - A_ids.txt, B_ids.txt, etc. And I would like to have their respective different outputs - A.fasta, B.fasta, etc.

Do I need to change my python script or I need to create a loop to run it for all my wanted files?

thanks

Upvotes: 0

Views: 3425

Answers (3)

Benjy Malca
Benjy Malca

Reputation: 637

I think simpler way could be storing the 40 filenames in a file (in the code: wanted_filenames_file), store them in an array (wanted_files) and loop along each one of the files :

# !/usr/bin/python
from Bio import SeqIO

fasta_file = "input.fa"  # Input fasta file
wanted_filenames_file = "filenames.txt"
with open(wanted_filenames_file) as f:
    wanted_files = f.readlines()
result_file = []  # Output fasta file
for wanted_file in wanted_files:
    wanted = set()
    with open(wanted_file) as f:
        for line in f:
            line = line.strip()
            if line != "":
                wanted.add(line)

    fasta_sequences = SeqIO.parse(open(fasta_file), 'fasta')
    result_file = wanted_file.replace("_ids.txt", ".fasta")
    with open(result_file, "w") as f:
        for seq in fasta_sequences:
            if seq.id in wanted:
                SeqIO.write([seq], f, "fasta")

Upvotes: 0

SolarPixelGaming
SolarPixelGaming

Reputation: 113

I agree with @BlackVegetable. Set it to use command line arguments, by doing something like this:

#!/usr/bin/python
from Bio import SeqIO

import sys # for sys.argv

fasta_file = sys.argv[1] # This is now going to be name.fa, the fasta file
wanted_file = sys.argv[2] # This is now going to be name_ids.txt, or whatever you passed
# as an argument
result_file = sys.argv[3] # Output fasta file, now passed as arg

wanted = set()
with open(wanted_file) as f:
    for line in f:
        line = line.strip()
        if line != "":
            wanted.add(line)

fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
    for seq in fasta_sequences:
        if seq.id in wanted:
            SeqIO.write([seq], f, "fasta")

Then you could call the program with python input.fa A_ids.txt A.fasta, in your case. Or, python inputB.fa B_ids.txt B.fasta.

Upvotes: 4

BlackVegetable
BlackVegetable

Reputation: 13034

Consider having this program taking commandline options. This would allow you to read the wanted_file name from the commandline as an argument and you could deduce the appropriate output file name by parsing the given argument and following a pattern (such as replace extension given with .fasta) or having the output pattern be another command line argument of some sort.

You could call your program as python my_script.py A_ids.txt and loop over that via bash. You could also choose to allow for a variable number of arguments, each of which would invoke your logic for the given name.

My preference for dealing with command line arguments is https://docs.python.org/3.3/library/argparse.html and https://docs.python.org/2/library/argparse.html depending on your python version.

(Additionally, if you take the path of using a single command line argument for the wanted_file, you could simply output the contents to stdout via print or similar functions and use a redirection operator in the command line to send the output to a filename provided there: python my_script.py A_ids.txt > A.fasta)

Upvotes: 0

Related Questions