Reputation: 2647
I have a master file and a set of subsidiary files, but I don't know the name of the subsidiary files until I look in the master file.
The master file contains two columns: some data and a second file name, e.g.,
data1_from_master hidden_file1
data2_from_master hidden_file2
data3_from_master hidden_file1
data4_from_master hidden_file3
data5_from_master hidden_file1
What I want to do is create a generator that yields an element from the first column of the master file, and then a line of data from one of the subsidiary files. For example,
data1_from_master line1_from_file1
data2_from_master line1_from_file2
data3_from_master line2_from_file1
data4_from_master line1_from_file3
data5_from_master line3_from_file1
The number of lines in the master file is equal to the sum of the number of lines in all the subsidiary files, so once the master file has been traversed, all of the subsidiary files will have been traversed as well.
If I only had two files that I wanted to open, and I knew their names in advance, I could do something like.
with open(master_file, 'r') as a, open(hidden_file, 'r') as b:
for line1, line2 in zip(a, b):
yield (line1, line2)
But the dilemma is that until I read a given line of the master file, I don't know what subsidiary file to read. And then there's the added complexity of trying to construct a generator where the lines of multiple different files.
Upvotes: 1
Views: 337
Reputation: 2479
You can keep a "cache" of open files and call fileobj.readline()
when needed:
def read_master_file(master):
other_files = {}
for line in master:
data, name = line.split()
if name not in otherfiles:
other_files[name] = open(name)
yield data, other_files[name].readline()
for f in other_files.values():
f.close()
Used as:
with open('master') as master:
for data, line in read_master_file(master):
# do stuff
This is one of the cases where you have to use files without with
unfortunately since you do not know how many files you'll have to deal with.
You could write a custom context manager to hold the "cache" to achieve something like:
def read_master_file(master):
with OtherFiles() as other_files:
for line in master:
data, name = line.split()
yield data, other_files.get_file(name).readline()
Where get_file
will lookup the cache and maybe open the file and the __exit__
method of OtherFiles()
will close of the opened files.
But if this is the only place where it will be used it does not really make sense.
Upvotes: 0
Reputation: 22314
You want to use an ExitStack
. This is a helper class provided by the contextlib
library to allow combining context managers. It can be used to keep multiple files open in a single with
statement.
from contextlib import ExitStack
def iter_master_file(filename):
with ExitStack() as stack:
master = stack.enter_context(open(filename))
hidden_files = {}
for line in master:
# You can parse the lines as you like
# Here I just assume the last word is a file name
*data, file = line.split()
if file not in hidden_files:
hidden_files[file] = stack.enter_context(open(file))
yield ' '.join(data), next(hidden_files[file]).strip()
Let's set up a few files for this example.
master says hidden1.txt is: hidden1.txt
master says hidden2.txt is: hidden2.txt
master says hidden1.txt is: hidden1.txt
master says hidden2.txt is: hidden2.txt
I am hidden file 1 line 1
I am hidden file 1 line 2
I am hidden file 2 line 1
I am hidden file 2 line 2
Here is the actual example.
for data, hidden_data in iter_master_file('master.txt'):
print(data, hidden_data)
master says hidden1.txt is: I am hidden file 1 line 1
master says hidden2.txt is: I am hidden file 2 line 1
master says hidden1.txt is: I am hidden file 1 line 2
master says hidden2.txt is: I am hidden file 2 line 2
Upvotes: 2