Reputation: 20245
I have the following functions:
def read_data(file_location):
for line in open(file_location):
# pre-process the line
yield line
def transform_1(data):
for line in data:
# change line in some way
yield line
def transform_2(data):
for line in data:
# change line in some other way
yield line
def process_file(file_location):
# Some description
#
# returns:
# generator
data = read_data(file_location)
data = transform_1(data)
data = transform_2(data)
return data
What I am trying to do is reading lines from a file, transforming each line with a number of functions, and then doing something with the resulting lines. I don't want to read all lines at once as the file is quite big.
My question is if I am doing this the right way. The code executes correctly, but the program execution in my head feels convoluted, to the point where I have no idea whether I will be able handle this code in a month or so.
So what I want to know is: Is there some kind of programming pattern that shows how to properly chain generators into each other?
Upvotes: 2
Views: 533
Reputation: 526
Actually, this is well done. I'm not sure why the code feels convoluted to you. The key is that each function is doing one thing only is a plus. Obviously the function names should reflect the kinds of transformations that are being made. Code like this is very testable and maintainable. When you need to make a change to the pipeline six months from now, you might be surprised how easy it is to find the part than needs adjusting an make the change.
I would suggest modifying your read_data generator as follows:
def read_data(file_location):
with open(file_location) as f:
for line in f:
yield line
Upvotes: 1
Reputation: 41248
Assuming each line is transformed in the same way, you could apply your transform functions to each line and use a generator to iterate over all lines, personally I find this more readable.
def transform_1(line):
return line.replace(' ','') # example of transformation
def transform_2(line):
return line.strip('#')
def process_file(file_location):
with open(file_location) as in_f:
for line in in_f:
yield transform_2(transform_1(line))
Depending on what the transforms do, they could possibly be combined into a single function, but it's hard to know without more context.
Upvotes: 3