Reputation: 141
I am trying to modify my .fasta files from this:
>YP_009208724.1 hypothetical protein ADP65_00072 [Achromobacter phage phiAxp-3]
MSNVLLKQ...
>YP_009220341.1 terminase large subunit [Achromobacter phage phiAxp-1]
MRTPSKSE...
>YP_009226430.1 DNA packaging protein [Achromobacter phage phiAxp-2]
MMNSDAVI...
to this:
>Achromobacter phage phiAxp-3
MSNVLLKQ...
>Achromobacter phage phiAxp-1
MRTPSKSE...
>Achromobacter phage phiAxp-2
MMNSDAVI...
Now, I've already have a script that can do it to a single file:
with open('Achromobacter.fasta', 'r') as fasta_file:
out_file = open('./fastas3/Achromobacter.fasta', 'w')
for line in fasta_file:
line = line.rstrip()
if '[' in line:
line = line.split('[')[-1]
out_file.write('>' + line[:-1] + "\n")
else:
out_file.write(str(line) + "\n")
but I can't get to automate the process for all 120 files in my folder.
I tried using glob.glob, but I can't seem to make it work:
import glob
for fasta_file in glob.glob('*.fasta'):
outfile = open('./fastas3/'+fasta_file, 'w')
with open(fasta_file, 'r'):
for line in fasta_file:
line = line.rstrip()
if '[' in line:
line2 = line.split('[')[-1]
outfile.write('>' + line2[:-1] + "\n")
else:
outfile.write(str(line) + "\n")
it gives me this output:
A
c
i
n
e
t
o
b
a
c
t
e
r
.
f
a
s
t
a
I managed to get a list of all files in the folder, but can't open certain files using the object on the list.
import os
file_list = []
for file in os.listdir("./fastas2/"):
if file.endswith(".fasta"):
file_list.append(file)
Upvotes: 3
Views: 4870
Reputation: 1385
Considering you are able to change the contents of file name now you need to automate the process. We changed the function for one file by removing file handler which was used twice for the opening of file.
def file_changer(filename):
data_to_put = ''
with open(filename, 'r+') as fasta_file:
for line in fasta_file.readlines():
line = line.rstrip()
if '[' in line:
line = line.split('[')[-1]
data_to_put += '>' + str(line[:-1]) + "\n"
else:
data_to_put += str(line) + "\n"
fasta_file.write(data_to_put)
fasta_file.close()
Now we need to iterate over all your files. So lets use glob
module for it
import glob
for file in glob.glob('*.fasta'):
file_changer(file)
Upvotes: 2
Reputation: 602385
You are iterating the file name, which gives you all the characters in the name instead of the lines of the file. Here is a corrected version of the code:
import glob
for fasta_file_name in glob.glob('*.fasta'):
with open(fasta_file_name, 'r') as fasta_file, \
open('./fastas3/' + fasta_file_name, 'w') as outfile:
for line in fasta_file:
line = line.rstrip()
if '[' in line:
line2 = line.split('[')[-1]
outfile.write('>' + line2[:-1] + "\n")
else:
outfile.write(str(line) + "\n")
As an alternative to the Python script, you can simply use sed
from the command line:
sed -i 's/^>.*\[\(.*\)\].*$/>\1/' *.fasta
This will modify all files in place, so consider copying them first.
Upvotes: 1