fullmetal04
fullmetal04

Reputation: 11

Trouble with understanding list comprehension

def reverse_words(text):
    return " ".join([x[::-1] for x in text.split(" ")])

I'm a little confused on the difference between the list comprehension and the for loop here.

for x in text.split(" "):
    return " ".join(x[::-1])

I thought these two would be the same, but they have different outputs. Can someone explain why?

Upvotes: 0

Views: 74

Answers (1)

alvas
alvas

Reputation: 122042

First, expand the list comprehension to a normal looking loop

From:

def reverse_words(text):
    return " ".join([x[::-1] for x in text.split(" ")])

to

def reverse_word(text):
    output_list = []
    for x in text.split(" "):
        output_list.append(x[::-1])
    return " ".join(output_list)

Then lets take a closer look at the "weird lines", e.g. x[::-1].

What is x[::-1]?

This is a short cut for reversing a list item, e.g.

>>> x = [1,2,3,4,5]
>>> x[::-1]
[5, 4, 3, 2, 1]

Now in string type:

>>> x = "abcde"
>>> x[::-1]
'edcba'

For more details, see

We have this weird text.split(" ")

Usually this is use in NLP (natural language processing) task to split a sentence into words, aka word tokenization e.g.

>>> text = "This is a sentence"
>>> text.split(" ")
['This', 'is', 'a', 'sentence']

So text.split(" ") returns "individual words" roughly (there are many nuance with "word tokenization" so str.split(" ") would be the simplest for English texts).

Combining text.split(" ") with x[::1]

Lets use some more sensible variable name than x here, essentially we are doing this:

# Word tokenization, so "abc def" -> ["abc", "def"]
for word in text.split(" "): 
    # Reverse each word, so "abc" -> "cba"
    print(word[::-1])

And what's with this last part on " ".join(...)?

str.join is a function from https://docs.python.org/3/library/stdtypes.html#str.join; its function is to join the items in a list with some str that you desire.

Here's some example:

>>> list_of_words = ["abc", "def", "xyz"]
>>> type(list_of_words)
<class 'list'>

>>> " ".join(list_of_words)
'abc def xyz'

>>> "-".join(list_of_words)
'abc-def-xyz'

>>> "-haha-".join(list_of_words)
'abc-haha-def-haha-xyz'

>>> output = " ".join(list_of_words)
>>> type(output)
<class 'str'>

Putting it all together, and expanding the list comprehension

We get this:

def reverse_words_in_text(text):
    # Keep a list of the reversed words.
    output_list_of_words = []

    # Word tokenization, so "abc def" -> ["abc", "def"]
    for word in text.split(" "): 
        # Reverse each word, so "abc" -> "cba"
        output_list_of_words.append(word[::-1])

    # Join back the tokens into a string.
    return " ".join(output_list_of_words)

Bonus: Can we avoid list comprehension totally for this function?

If you like one liner:

>>> reverse_words = lambda text: " ".join(text[::-1].split()[::-1]) 
>>> reverse_words('abc def ghi')
'cba fed ihg'

But that just makes the code even more unreadable.

Upvotes: 2

Related Questions