mau5padd
mau5padd

Reputation: 405

re.sub emptying list

def process_dialect_translation_rules():

    # Read in lines from the text file specified in sys.argv[1], stripping away
    # excess whitespace and discarding comments (lines that start with '##').
    f_lines = [line.strip() for line in open(sys.argv[1], 'r').readlines()]
    f_lines = filter(lambda line: not re.match(r'##', line), f_lines)

    # Remove any occurances of the pattern '\s*<=>\s*'. This leaves us with a 
    # list of lists. Each 2nd level list has two elements: the value to be 
    # translated from and the value to be translated to. Use the sub function
    # from the re module to get rid of those pesky asterisks.
    f_lines = [re.split(r'\s*<=>\s*', line) for line in f_lines]
    f_lines = [re.sub(r'"', '', elem) for elem in line for line in f_lines]

This function should take the lines from a file and perform some operations on the lines, such as removing any lines that begin with ##. Another operation that I wish to perform is to remove the quotation marks around the words in the line. However, when the final line of this script runs, f_lines becomes an empty lines. What happened?

Requested lines of original file:

##  English-Geek Reversible Translation File #1
##   (Moderate Geek)
##  Created by Todd WAreham, October 2009

"TV show"    <=> "STAR TREK"
"food"       <=> "pizza"
"drink"      <=> "Red Bull"
"computer"   <=> "TRS 80"
"girlfriend" <=> "significant other"

Upvotes: 2

Views: 282

Answers (2)

John Machin
John Machin

Reputation: 82992

Your basic problem is that you have chosen an over-complicated way of doing things, and come unstuck. Use the simplest tool that will get the job done. You don't need filter, map, lambda, readlines, and all of those list comprehensions (one will do). Using re.match instead of startswith is overkill. So is using re.sub where str.replace would do the job.

with open(sys.argv[1]) as f:
    d = {}
    for line in f:
        line = line.strip()
        if not line: continue # empty line
        if line.startswith('##'): continue # comment line
        parts = line.split('<=>')
        assert len(parts) == 2 # or print an error message ...
        key, value = [part.strip('" ') for part in parts]
        assert key not in d # or print an error message ...
        d[key] = value

Bonus extra: You get to check for dodgy lines and duplicate keys.

Upvotes: 0

sth
sth

Reputation: 229754

In Python, multiple for loops in a list comprehension are handled from left to right, not from right to left, so your last expression should read:

[re.sub(r'"', '', elem) for line in f_lines for elem in line]

It doesn't lead to an error as it is, since list comprehensions leak the loop variable, so line is still in scope from the previous expression. If that line then is an empty string you get an empty list as result.

Upvotes: 2

Related Questions