Reputation: 355
I have a python script as follow:
#!/usr/bin/python
from Bio import SeqIO
fasta_file = "input.fa" # Input fasta file
wanted_file = "A_ids.txt" # Input interesting sequence IDs, one per line
result_file = "A.fasta" # Output fasta file
wanted = set()
with open(wanted_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
for seq in fasta_sequences:
if seq.id in wanted:
SeqIO.write([seq], f, "fasta")
I would like to run the script above for the same input file, but for 40 different wanted_files - with different names - A_ids.txt, B_ids.txt, etc. And I would like to have their respective different outputs - A.fasta, B.fasta, etc.
Do I need to change my python script or I need to create a loop to run it for all my wanted files?
thanks
Upvotes: 0
Views: 3425
Reputation: 637
I think simpler way could be storing the 40 filenames in a file (in the code: wanted_filenames_file
), store them in an array (wanted_files
) and loop along each one of the files :
# !/usr/bin/python
from Bio import SeqIO
fasta_file = "input.fa" # Input fasta file
wanted_filenames_file = "filenames.txt"
with open(wanted_filenames_file) as f:
wanted_files = f.readlines()
result_file = [] # Output fasta file
for wanted_file in wanted_files:
wanted = set()
with open(wanted_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file), 'fasta')
result_file = wanted_file.replace("_ids.txt", ".fasta")
with open(result_file, "w") as f:
for seq in fasta_sequences:
if seq.id in wanted:
SeqIO.write([seq], f, "fasta")
Upvotes: 0
Reputation: 113
I agree with @BlackVegetable. Set it to use command line arguments, by doing something like this:
#!/usr/bin/python
from Bio import SeqIO
import sys # for sys.argv
fasta_file = sys.argv[1] # This is now going to be name.fa, the fasta file
wanted_file = sys.argv[2] # This is now going to be name_ids.txt, or whatever you passed
# as an argument
result_file = sys.argv[3] # Output fasta file, now passed as arg
wanted = set()
with open(wanted_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
for seq in fasta_sequences:
if seq.id in wanted:
SeqIO.write([seq], f, "fasta")
Then you could call the program with python input.fa A_ids.txt A.fasta
, in your case. Or, python inputB.fa B_ids.txt B.fasta
.
Upvotes: 4
Reputation: 13034
Consider having this program taking commandline options. This would allow you to read the wanted_file
name from the commandline as an argument and you could deduce the appropriate output file name by parsing the given argument and following a pattern (such as replace extension given with .fasta
) or having the output pattern be another command line argument of some sort.
You could call your program as python my_script.py A_ids.txt
and loop over that via bash. You could also choose to allow for a variable number of arguments, each of which would invoke your logic for the given name.
My preference for dealing with command line arguments is https://docs.python.org/3.3/library/argparse.html and https://docs.python.org/2/library/argparse.html depending on your python version.
(Additionally, if you take the path of using a single command line argument for the wanted_file
, you could simply output the contents to stdout
via print
or similar functions and use a redirection operator in the command line to send the output to a filename provided there: python my_script.py A_ids.txt > A.fasta
)
Upvotes: 0