Is this a bad use of a `yield` statement?

Question

I was taking a look at the code of a coworker and I felt like this was an unnecessary use of the yield statement. It was something like this:

def standardize_text(text: str):
    pattern = r"ABC" # some regex
    yield re.sub(pattern, "X", text)

def preprocess_docs(docs: List[str]):
    for doc in docs:
        yield standardize_text(doc)

I understand the use of yield in preprocess_docs so that I can return a generator, which would be helpful if docs is a large list. But I don't understand the value of the yield in the standardize_text function. To me, a return statement would do the exact same thing.

Is there a reason why that yield would be useful?

jtbandes · Accepted Answer

To me, a return statement would do the exact same thing.

Using return instead wouldn't be the same as yield, as explained in ShadowRanger's comment.

With yield, calling the function gives you a generator object:

>>> standardize_text("ABCD")

Generators can produce more than one result (unlike functions that use return). This generator happens to produce exactly one item, which is a string (the result of re.sub). You can collect the generator's results into a list(), for example, or just grab the first result with next():

>>> list(standardize_text("ABCD"))
['XD']

>>> g = standardize_text("ABCD")
>>> next(g)
'XD'
>>> next(g) # raises StopIteration, indicating the generator has finished

If we change the function to use return:

def standardize_text(text: str):
    pattern = r"ABC" # some regex
    return re.sub(pattern, "X", text)

Then calling the function just gives us the single result only — no list() or next() needed.

>>> standardize_text("ABCD")
'XD'

Is there a reason why that yield would be useful?

In the standardize_text function, no, not really. But your preprocess_docs function actually does make use of returning more than one value with yield: it returns a generator with one result for each of the values in docs. Those results are either generators themselves (in your original code with yield) or strings (if we change standardize_text to use return).

def preprocess_docs(docs: List[str]):
    for doc in docs:
        yield standardize_text(doc)

# returns a generator because the implementation uses "yield"
>>> preprocess_docs(["ABCD", "AAABC"])


# with standardize_text using "yield re.sub..."
>>> for x in preprocess_docs(["ABCD", "AAABC"]): print(x)
... 




# with standardize_text using "return re.sub..."
>>> for x in preprocess_docs(["ABCD", "AAABC"]): print(x)
... 
XD
AAX

Note: Prior to Python 3's async/await, some concurrency libraries used yield in the same way that await is now used. For example, Twisted's @inlineCallbacks. I don't think this is directly relevant to your question, but I included it for completeness.

Is this a bad use of a `yield` statement?

Answers (1)

Related Questions