Reputation: 67
I was taking a look at the code of a coworker and I felt like this was an unnecessary use of the yield
statement. It was something like this:
def standardize_text(text: str):
pattern = r"ABC" # some regex
yield re.sub(pattern, "X", text)
def preprocess_docs(docs: List[str]):
for doc in docs:
yield standardize_text(doc)
I understand the use of yield
in preprocess_docs
so that I can return a generator, which would be helpful if docs
is a large list. But I don't understand the value of the yield
in the standardize_text
function. To me, a return
statement would do the exact same thing.
Is there a reason why that yield
would be useful?
Upvotes: 0
Views: 438
Reputation: 118761
To me, a
return
statement would do the exact same thing.
Using return
instead wouldn't be the same as yield
, as explained in ShadowRanger's comment.
With yield
, calling the function gives you a generator object:
>>> standardize_text("ABCD")
<generator object standardize_text at 0x10561f740>
Generators can produce more than one result (unlike functions that use return
). This generator happens to produce exactly one item, which is a string (the result of re.sub
). You can collect the generator's results into a list()
, for example, or just grab the first result with next()
:
>>> list(standardize_text("ABCD"))
['XD']
>>> g = standardize_text("ABCD")
>>> next(g)
'XD'
>>> next(g) # raises StopIteration, indicating the generator has finished
If we change the function to use return
:
def standardize_text(text: str):
pattern = r"ABC" # some regex
return re.sub(pattern, "X", text)
Then calling the function just gives us the single result only — no list()
or next()
needed.
>>> standardize_text("ABCD")
'XD'
Is there a reason why that
yield
would be useful?
In the standardize_text
function, no, not really. But your preprocess_docs
function actually does make use of returning more than one value with yield
: it returns a generator with one result for each of the values in docs
. Those results are either generators themselves (in your original code with yield
) or strings (if we change standardize_text
to use return
).
def preprocess_docs(docs: List[str]):
for doc in docs:
yield standardize_text(doc)
# returns a generator because the implementation uses "yield"
>>> preprocess_docs(["ABCD", "AAABC"])
<generator object preprocess_docs at 0x10561f820>
# with standardize_text using "yield re.sub..."
>>> for x in preprocess_docs(["ABCD", "AAABC"]): print(x)
...
<generator object standardize_text at 0x1056cce40>
<generator object standardize_text at 0x1056cceb0>
# with standardize_text using "return re.sub..."
>>> for x in preprocess_docs(["ABCD", "AAABC"]): print(x)
...
XD
AAX
Note: Prior to Python 3's async
/await
, some concurrency libraries used yield
in the same way that await
is now used. For example, Twisted's @inlineCallbacks
. I don't think this is directly relevant to your question, but I included it for completeness.
Upvotes: 1