re.sub emptying list

Question

def process_dialect_translation_rules():

    # Read in lines from the text file specified in sys.argv[1], stripping away
    # excess whitespace and discarding comments (lines that start with '##').
    f_lines = [line.strip() for line in open(sys.argv[1], 'r').readlines()]
    f_lines = filter(lambda line: not re.match(r'##', line), f_lines)

    # Remove any occurances of the pattern '\s*<=>\s*'. This leaves us with a 
    # list of lists. Each 2nd level list has two elements: the value to be 
    # translated from and the value to be translated to. Use the sub function
    # from the re module to get rid of those pesky asterisks.
    f_lines = [re.split(r'\s*<=>\s*', line) for line in f_lines]
    f_lines = [re.sub(r'"', '', elem) for elem in line for line in f_lines]

This function should take the lines from a file and perform some operations on the lines, such as removing any lines that begin with ##. Another operation that I wish to perform is to remove the quotation marks around the words in the line. However, when the final line of this script runs, f_lines becomes an empty lines. What happened?

Requested lines of original file:

##  English-Geek Reversible Translation File #1
##   (Moderate Geek)
##  Created by Todd WAreham, October 2009

"TV show"    <=> "STAR TREK"
"food"       <=> "pizza"
"drink"      <=> "Red Bull"
"computer"   <=> "TRS 80"
"girlfriend" <=> "significant other"

sth · Accepted Answer

In Python, multiple for loops in a list comprehension are handled from left to right, not from right to left, so your last expression should read:

[re.sub(r'"', '', elem) for line in f_lines for elem in line]

It doesn't lead to an error as it is, since list comprehensions leak the loop variable, so line is still in scope from the previous expression. If that line then is an empty string you get an empty list as result.

re.sub emptying list

Answers (2)

Related Questions