Meritxell Riera
Meritxell Riera

Reputation: 1

Writting several files according to an element of an original file

I need to read a file in bed format that contains coordinates of all chr in a genome, into different files according with the chr. I tried this approach but it doesn't work, it doesn't create any files. Any idees why this happens or alternative approaches to solve this problem?

import sys

def make_out_file(dir_path, chr_name, extension):

    file_name = dir_path + "/" + chr_name + extension
    out_file = open(file_name, "w")
    out_file.close()
    return file_name

def append_output_file(line, out_file):

    with open(out_file, "a") as f:
        f.write(line)
    f.close()

in_name = sys.argv[1]
dir_path = sys.argv[2]

with open(in_name, "r") as in_file:

    file_content = in_file.readlines()
    chr_dict = {}
    out_file_dict = {}
    line_count = 0
    for line in file_content[:0]:
        line_count += 1
        elems = line.split("\t")
        chr_name = elems[0]
        chr_dict[chr_name] += 1
        if chr_dict.get(chr_name) = 1:
            out_file = make_out_file(dir_path, chr_name, ".bed")
            out_file_dict[chr_name] = out_file
            append_output_file(line, out_file)
        elif chr_dict.get(chr_name) > 1:
            out_file = out_file_dict.get(chr_name)
            append_output_file(line, out_file)
        else:
            print "There's been an Error"


in_file.close()

Upvotes: 0

Views: 20

Answers (1)

mhawke
mhawke

Reputation: 87084

This line:

for line in file_content[:0]:

says to iterate over an empty list. The empty list comes from the slice [:0] which says to slice from the beginning of the list to just before the first element. Here's a demonstration:

>>> l = ['line 1\n', 'line 2\n', 'line 3\n']
>>> l[:0]
[]
>>> l[:1]
['line 1\n']

Because the list is empty no iteration takes place, so the code in the body of your for loop in not executed.

To iterate over each line of the file you do not need the slice:

for line in file_content:

However, it is better again to iterate over the file object as this does not require that the whole file be first read into memory:

with open(in_name, "r") as in_file:    
    chr_dict = {}
    out_file_dict = {}
    line_count = 0
    for line in in_file:
        ...

Following that there are numerous problems, including syntax errors, with the code in the for loop which you can begin debugging.

Upvotes: 1

Related Questions