Arne
Arne

Reputation: 20245

How to write a good generator-chaining function

I have the following functions:

def read_data(file_location):
    for line in open(file_location):
        # pre-process the line  
        yield line

def transform_1(data):
    for line in data:
        # change line in some way
        yield line

def transform_2(data):
    for line in data:
        # change line in some other way
        yield line

def process_file(file_location):
    # Some description
    #
    # returns:
    #     generator
    data = read_data(file_location)
    data = transform_1(data)
    data = transform_2(data)
    return data

What I am trying to do is reading lines from a file, transforming each line with a number of functions, and then doing something with the resulting lines. I don't want to read all lines at once as the file is quite big.

My question is if I am doing this the right way. The code executes correctly, but the program execution in my head feels convoluted, to the point where I have no idea whether I will be able handle this code in a month or so.

So what I want to know is: Is there some kind of programming pattern that shows how to properly chain generators into each other?

Upvotes: 2

Views: 533

Answers (2)

Greg Mueller
Greg Mueller

Reputation: 526

Actually, this is well done. I'm not sure why the code feels convoluted to you. The key is that each function is doing one thing only is a plus. Obviously the function names should reflect the kinds of transformations that are being made. Code like this is very testable and maintainable. When you need to make a change to the pipeline six months from now, you might be surprised how easy it is to find the part than needs adjusting an make the change.

I would suggest modifying your read_data generator as follows:

def read_data(file_location):
    with open(file_location) as f:
        for line in f:
            yield line

Upvotes: 1

Chris_Rands
Chris_Rands

Reputation: 41248

Assuming each line is transformed in the same way, you could apply your transform functions to each line and use a generator to iterate over all lines, personally I find this more readable.

def transform_1(line):
    return line.replace(' ','') # example of transformation

def transform_2(line):
    return line.strip('#')

def process_file(file_location):
    with open(file_location) as in_f:
        for line in in_f:
            yield transform_2(transform_1(line))

Depending on what the transforms do, they could possibly be combined into a single function, but it's hard to know without more context.

Upvotes: 3

Related Questions