sreisman
sreisman

Reputation: 668

Opening multiple python files from folder

I am trying to take a folder which contains 9 files, each containing FASTA records of separate genes, and remove duplicate records. I want to set it up so that the script is called with the folder that contains the genes as the first parameter, and a new folder name to rewrite the new files without duplicates to. However, if the files are stored in a folder called results within the current directory it is not letting me open any of the gene files within that folder to process them for duplicates. I have searched around and it seems that I should be able to call python's open() function with a string of the file name like this:

input_handle = open(f, "r")

This line is not allowng me to open the file to read its contents, and I think it may have something to do with the type of f, which shows to be type 'str' when I call type(f)

Also, if I use the full path:

input_handle = open('~/Documents/Research/Scala/hiv-biojava-scala/results/rev.fa', "r")

It says that no such file exists. I have checked my spelling and I am sure that the file does exist. I also get that file does not exist if I try to call its name as a raw string:

input_handle = open(r'~/Documents/Research/Scala/hiv-biojava-scala/results/rev.fa', "r")

Or if I try to call it as the following it says that no global results exists:

input_handle = open(os.path.join(os.curdir,results/f), "r")

Here is the full code. If anybody knows what the problem is I would really appreciate any help that you could offer.

#!/usr/bin/python
import os
import os.path
import sys
import re
from Bio import SeqIO

def processFiles(files) :
    for f in files:
             process(f)

def process(f):
    input_handle = open(f, "r") 
    records = list(SeqIO.parse(input_handle, "fasta"))
    print records 
    i = 0
    while i < len(records)-1:
            temp = records[i]
            next = records[i+1]
            if (next.id == temp.id) :
                    print "duplicate found at " + next.id
                    if (len(next.seq) < len(temp.seq)) :
                            records.pop(i+1)
                    else :
                            records.pop(i)
            i = i + 1


    output_handle = open("out.fa", "w")
    for record in records:
            SeqIO.write(records, output_handle, "fasta")

    input_handle.close()



def main():
    input_folder = sys.argv[1]
    out_folder = sys.argv[2]
    if os.path.exists(out_folder):
            print("Folder %s exists; please specify empty folder or new one" %          out_folder)
            sys.exit(1)
    os.makedirs(out_folder)
    files = os.listdir(input_folder)
    print files
    processFiles(files)     



main()

Upvotes: 2

Views: 712

Answers (1)

kirbyfan64sos
kirbyfan64sos

Reputation: 10727

Try input_handle = open(os.path.join(os.getcwd,results/f), "r"). os.curdir returns . See mail.python.org/pipermail/python-list/2012-September/631864.html.

Upvotes: 2

Related Questions