Spyros
Spyros

Reputation: 259

Writing multiple lists to multiple output files

I am working with datasets stored in large text files. For the analysis I am carrying out, I open the files, extract parts of the dataset and compare the extracted subsets. My code works like so:

from math import ceil

with open("seqs.txt","rb") as f:
    f = f.readlines()

assert type(f) == list, "ERROR: file object not converted to list"

fives = int( ceil(0.05*len(f)) ) 
thirds = int( ceil(len(f)/3) )

## top/bottom 5% of dataset
low_5=f[0:fives]
top_5=f[-fives:]

## top/bottom 1/3 of dataset
low_33=f[0:thirds]
top_33=f[-thirds:]

## Write lists to file
# top-5
with open("high-5.out","w") as outfile1:
   for i in top_5:
       outfile1.write("%s" %i)
# low-5
with open("low-5.out","w") as outfile2:
    for i in low_5:
        outfile2.write("%s" %i)
# top-33
with open("high-33.out","w") as outfile3:
    for i in top_33:
        outfile3.write("%s" %i)
# low-33        
with open("low-33.out","w") as outfile4:
    for i in low_33:
        outfile4.write("%s" %i)

I am trying to find a more clever way of automating the process of writing the lists out to files. In this case there are only four, but in the future cases where I may end up with as many as 15-25 lists I would some function to take care of this. I wrote the following:

def write_to_file(*args):
    for i in args:
        with open(".out", "w") as outfile:
            outfile.write("%s" %i)

but the resulting file only contains the final list when I call the function like so:

write_to_file(low_33,low_5,top_33,top_5)

I understand that I have to define an output file for each list (which I am not doing in the function above), I'm just not sure how to implement this. Any ideas?

Upvotes: 1

Views: 180

Answers (5)

Hai Vu
Hai Vu

Reputation: 40688

Don't try to be clever. Instead aim to have your code readable, easy to understand. You can group repeated code into a function, for example:

from math import ceil

def save_to_file(data, filename):
    with open(filename, 'wb') as f:
        for item in data:
            f.write('{}'.format(item))

with open('data.txt') as f:
    numbers = list(f)

five_percent = int(len(numbers) * 0.05)
thirty_three_percent = int(ceil(len(numbers) / 3.0))
# Why not: thirty_three_percent = int(len(numbers) * 0.33)
save_to_file(numbers[:five_percent], 'low-5.out')
save_to_file(numbers[-five_percent:], 'high-5.out')
save_to_file(numbers[:thirty_three_percent], 'low-33.out')
save_to_file(numbers[-thirty_three_percent:], 'high-33.out')

Update

If you have quite a number of lists to write, then it makes sense to use a loop. I suggest to have two functions: save_top_n_percent and save_low_n_percent to help with the job. They contain a little duplicated code, but by separating them into two functions, it is clearer and easier to understand.

def save_to_file(data, filename):
    with open(filename, 'wb') as f:
        for item in data:
            f.write(item)

def save_top_n_percent(n, data):
    n_percent = int(len(data) * n / 100.0)
    save_to_file(data[-n_percent:], 'top-{}.out'.format(n))

def save_low_n_percent(n, data):
    n_percent = int(len(data) * n / 100.0)
    save_to_file(data[:n_percent], 'low-{}.out'.format(n))

with open('data.txt') as f:
    numbers = list(f)

for n_percent in [5, 33]:
    save_top_n_percent(n_percent, numbers)
    save_low_n_percent(n_percent, numbers)

Upvotes: 1

NDevox
NDevox

Reputation: 4086

Make your variable names match your filenames and then use a dictionary to hold them instead of keeping them in the global namespace:

data = {'high_5': # data
       ,'low_5': # data
       ,'high_33': # data
       ,'low_33': # data}

for key in data:
    with open('{}.out'.format(key), 'w') as output:
        for i in data[key]:
            output.write(i)

Keeps your data in a single easy to use place, and assuming you want to apply the same actions to them you can continue using the same paradigm.

As mentioned by PM2Ring below, it would be advisable to use underscores (as you do in the variable names) instead of dashes(as you do in the filenames) as by doing so you can pass the dictionary keys as keyword arguments into a writing function:

write_to_file(**data)

This would equate to:

write_to_file(low_5=f[:fives], high_5=f[-fives:],...) # and the rest of the data

From this you could use one of the functions defined by the other answers.

Upvotes: 1

riddler
riddler

Reputation: 467

You are creating a file called '.out' and overwriting it each time.

def write_to_file(*args):
    for i in args:
        filename = i + ".out"
        contents = globals()[i]
        with open(".out", "w") as outfile:
            outfile.write("%s" %contents)


write_to_file("low_33", "low_5", "top_33", "top_5")

https://stackoverflow.com/a/6504497/3583980 (variable name from a string)

This will create low_33.out, low_5.out, top_33.out, top_5.out and their contents will be the lists stored in these variables.

Upvotes: 0

Songy
Songy

Reputation: 851

On this line you are opening up a file called .out each time and writing to it.

with open(".out", "w") as outfile:

You need to make the ".out" unique for each i in args. you can achieve this by passing in a list as the args and the list will contain the file name and data.

def write_to_file(*args):
    for i in args:
        with open("%s.out" % i[0], "w") as outfile:
            outfile.write("%s" % i[1])

And pass in arguments like so...

write_to_file(["low_33",low_33],["low_5",low_5],["top_33",top_33],["top_5",top_5])

Upvotes: 0

mdml
mdml

Reputation: 22882

You could have one output file per argument by incrementing a counter for each argument. For example:

def write_to_file(*args):
    for index, i in enumerate(args):
        with open("{}.out".format(index+1), "w") as outfile:
           outfile.write("%s" %i)

The example above will create output files "1.out", "2.out", "3.out", and "4.out".

Alternatively, if you had specific names you wanted to use (as in your original code), you could do something like the following:

def write_to_file(args):
    for name, data in args:
        with open("{}.out".format(name), "w") as outfile:
            outfile.write("%s" % data)

args = [('low-33', low_33), ('low-5', low_5), ('high-33', top_33), ('high-5', top_5)]
write_to_file(args)

which would create output files "low-33.out", "low-5.out", "high-33.out", and "high-5.out".

Upvotes: 1

Related Questions