hobscrk777
hobscrk777

Reputation: 2647

Create Generator From Lines of Multiple Files

I have a master file and a set of subsidiary files, but I don't know the name of the subsidiary files until I look in the master file.

The master file contains two columns: some data and a second file name, e.g.,

data1_from_master   hidden_file1
data2_from_master   hidden_file2
data3_from_master   hidden_file1
data4_from_master   hidden_file3
data5_from_master   hidden_file1

What I want to do is create a generator that yields an element from the first column of the master file, and then a line of data from one of the subsidiary files. For example,

data1_from_master    line1_from_file1
data2_from_master    line1_from_file2
data3_from_master    line2_from_file1
data4_from_master    line1_from_file3
data5_from_master    line3_from_file1

The number of lines in the master file is equal to the sum of the number of lines in all the subsidiary files, so once the master file has been traversed, all of the subsidiary files will have been traversed as well.

If I only had two files that I wanted to open, and I knew their names in advance, I could do something like.

with open(master_file, 'r') as a, open(hidden_file, 'r') as b:
    for line1, line2 in zip(a, b):
        yield (line1, line2)

But the dilemma is that until I read a given line of the master file, I don't know what subsidiary file to read. And then there's the added complexity of trying to construct a generator where the lines of multiple different files.

Upvotes: 1

Views: 337

Answers (2)

Giacomo Alzetta
Giacomo Alzetta

Reputation: 2479

You can keep a "cache" of open files and call fileobj.readline() when needed:

def read_master_file(master):
    other_files = {}
    for line in master:
        data, name = line.split()
        if name not in otherfiles:
            other_files[name] = open(name)
        yield data, other_files[name].readline()
    for f in other_files.values():
        f.close()

Used as:

with open('master') as master:
    for data, line in read_master_file(master):
        # do stuff

This is one of the cases where you have to use files without with unfortunately since you do not know how many files you'll have to deal with.

You could write a custom context manager to hold the "cache" to achieve something like:

def read_master_file(master):
    with OtherFiles() as other_files:
        for line in master:
            data, name = line.split()
            yield data, other_files.get_file(name).readline()

Where get_file will lookup the cache and maybe open the file and the __exit__ method of OtherFiles() will close of the opened files.

But if this is the only place where it will be used it does not really make sense.

Upvotes: 0

Olivier Melançon
Olivier Melançon

Reputation: 22314

You want to use an ExitStack. This is a helper class provided by the contextlib library to allow combining context managers. It can be used to keep multiple files open in a single with statement.

from contextlib import ExitStack

def iter_master_file(filename):
    with ExitStack() as stack:
        master = stack.enter_context(open(filename))
        hidden_files = {}

        for line in master:
            # You can parse the lines as you like
            # Here I just assume the last word is a file name
            *data, file = line.split()

            if file not in hidden_files:
                hidden_files[file] = stack.enter_context(open(file))

            yield ' '.join(data), next(hidden_files[file]).strip()

Example

Let's set up a few files for this example.

Files

master.txt

master says hidden1.txt is: hidden1.txt
master says hidden2.txt is: hidden2.txt
master says hidden1.txt is: hidden1.txt
master says hidden2.txt is: hidden2.txt

hidden1.txt

I am hidden file 1 line 1
I am hidden file 1 line 2

hidden2.txt

I am hidden file 2 line 1
I am hidden file 2 line 2

Here is the actual example.

Code

for data, hidden_data in iter_master_file('master.txt'):
    print(data, hidden_data)

Output

master says hidden1.txt is: I am hidden file 1 line 1
master says hidden2.txt is: I am hidden file 2 line 1
master says hidden1.txt is: I am hidden file 1 line 2
master says hidden2.txt is: I am hidden file 2 line 2

Upvotes: 2

Related Questions