python list comprehension with cls

Question

I encountered a snippet of code like the following:

array = ['a', 'b', 'c']
ids = [array.index(cls.lower()) for cls in array]

I'm confusing for two points:

what does [... for cls in array] mean, since cls is a reserved keyword for class, why not just using [... for s in array]?
why bother to write something complicated like this instead of just [i for i in range(len(array))].

I believe this code is written by someone more experienced with python than me, and I believe he must have some reason for doing so...

Alain T. · Accepted Answer

cls is not a reserved word for class. That would be a very poor choice of name by the language designer. Many programmers may use it by convention but it is no more reserved than the parameter name self.

If you use distinct upper and lower case characters in the list, you will see the difference:

array = ['a', 'b', 'c', 'B','A','c']
ids = [array.index(cls.lower()) for cls in array]
print(ids)

[0, 1, 2, 1, 0, 2]

The value at position 3 is 1 instead of 3 because the first occurrence of a lowercase 'B' is at index 1. Similarly, the value at the last positions is 2 instead of 5 because the first 'c' is at index 2.

This list comprehension requires that the array always contain a lowercase instance of every uppercase letter. For example ['a', 'B', 'c'] would make it crash. Hopefully there are other safeguards in the rest of the program to ensure that this requirement is always met.

A safer, and more efficient way to write this would be to build a dictionary of character positions before going through the array to get indexes. This would make the time complexity O(n) instead of O(n^2). It could also help make the process more robust.

array     = ['a', 'b', 'c', 'B','A','c','Z']
firstchar = {c:-i for i,c in enumerate(array[::-1],1-len(array))}
ids       = [firstchar.get(c.lower()) for c in array]

print(ids)
[0, 1, 2, 1, 0, 2, None]

The firstchar dictionary contains the first index in array containing a given letter. It is built by going backward through the array so that the smallest index remains when there are multiple occurrences of the same letter.

{'Z': 6, 'c': 2, 'A': 4, 'B': 3, 'b': 1, 'a': 0}

Then, going through the array to form ids, each character finds the corresponding index in O(1) time by using the dictionary.

Using the .get() method allows the list comprehension to survive an upper case letter without a corresponding lowercase value in the list. In this example it returns None but it could also be made to return the letter's index or the index of the first uppercase instance.

python list comprehension with cls

Answers (2)

Related Questions