user8897793
user8897793

Reputation: 23

line of code using zip_longest in itertools module for Python

I know this is a really specific question, but I hope it may help someone else as well. Could you please help me fully understand this line of code in Python:

    new_word = ''.join(c if c == g else '_' for c,
                       g in zip_longest(correct, GUESS, fillvalue='_'))

I have a code review soon, and this is the only line of code which I don't fully understand. I am not familiar with using 'if' and 'for' all in one line. How can I rewrite it in several statements?

This is the zip_longest Python documentation, but I still do not understand how exactly it works in the above context

def zip_longest(*args, fillvalue=None):
    # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    iterators = [iter(it) for it in args]
    num_active = len(iterators)
    if not num_active:
        return
    while True:
        values = []
        for i, it in enumerate(iterators):
            try:
                value = next(it)
            except StopIteration:
                num_active -= 1
                if not num_active:
                    return
                iterators[i] = repeat(fillvalue)
                value = fillvalue
            values.append(value)
        yield tuple(values)

Upvotes: 0

Views: 1010

Answers (2)

benvc
benvc

Reputation: 15120

The code in your question is zipping together 2 iterables (probably strings) in order to compare each pair of characters, choose a character to return, and concatenate the resulting characters back into a string. Looks like code from a hangman style word game.

You could accomplish the same thing in slightly longer form and break the join from the loop to better see how it works. Following is an example using the words "hangman" and "winger" (below you will find a detailed explanation of the code example):

from itertools import zip_longest

chars = []
correct = 'hangman'
guess = 'winger'
for c, g in zip_longest(correct, guess, fillvalue='_'):
    if c == g:
        chars.append(c)
    else:
        chars.append('_')

print(chars)
# ['_', '_', 'n', 'g', '_', '_', '_']

word = ''.join(chars)
print(word)
# __ng___      

For each iteration, the c if c == g else '_' code is checking whether or not the characters at the same position in each word are the same or not. If the characters are the same, then the matching character is added to the output list. If the characters are different, an underscore is added to the output list. On the first iteration of our example words, c is "h" and g is "w", so the characters are different and an underscore is added to the output list. Since the third character in each example word are both "n", "n" is added to the output list on the third iteration. The fillvalue in zip_longest ensures that you iterate over the longest word in the pair completely and adds an underscore character to replace the missing characters from the shorter word.

The resulting output list would be ['_', '_', 'n', 'g', '_', '_', '_']. This is because the first two characters in each example word are different (resulting in an underscore), the third and fourth characters are the same (resulting in the matching character), the fifth and sixth characters are different, and the last character in "hangman" is compared to the fillvalue since "winger" is one character shorter than "hangman".

Finally, ''.join() joins the list of characters into a string that looks like "__ng___".

Upvotes: 1

glibdud
glibdud

Reputation: 7850

This usage of a one-line for statement in a method call is called a generator expression as defined in PEP 289. You could rewrite this expression as follows:

new_word = ''
for c, g in zip_longest(correct, GUESS, fillvalue='_'):
    if c == g:
        new_word += c
    else:
        new_word += '_'

In plain terms, new_word will be the same length as the longer of correct and GUESS, and will consist of underscore characters in all positions except those where correct and GUESS are identical.

Upvotes: 2

Related Questions