Reputation: 405
def process_dialect_translation_rules():
# Read in lines from the text file specified in sys.argv[1], stripping away
# excess whitespace and discarding comments (lines that start with '##').
f_lines = [line.strip() for line in open(sys.argv[1], 'r').readlines()]
f_lines = filter(lambda line: not re.match(r'##', line), f_lines)
# Remove any occurances of the pattern '\s*<=>\s*'. This leaves us with a
# list of lists. Each 2nd level list has two elements: the value to be
# translated from and the value to be translated to. Use the sub function
# from the re module to get rid of those pesky asterisks.
f_lines = [re.split(r'\s*<=>\s*', line) for line in f_lines]
f_lines = [re.sub(r'"', '', elem) for elem in line for line in f_lines]
This function should take the lines from a file and perform some operations on the lines, such as removing any lines that begin with ##
. Another operation that I wish to perform is to remove the quotation marks around the words in the line. However, when the final line of this script runs, f_lines
becomes an empty lines. What happened?
Requested lines of original file:
## English-Geek Reversible Translation File #1
## (Moderate Geek)
## Created by Todd WAreham, October 2009
"TV show" <=> "STAR TREK"
"food" <=> "pizza"
"drink" <=> "Red Bull"
"computer" <=> "TRS 80"
"girlfriend" <=> "significant other"
Upvotes: 2
Views: 282
Reputation: 82992
Your basic problem is that you have chosen an over-complicated way of doing things, and come unstuck. Use the simplest tool that will get the job done. You don't need filter, map, lambda, readlines, and all of those list comprehensions (one will do). Using re.match instead of startswith is overkill. So is using re.sub where str.replace would do the job.
with open(sys.argv[1]) as f:
d = {}
for line in f:
line = line.strip()
if not line: continue # empty line
if line.startswith('##'): continue # comment line
parts = line.split('<=>')
assert len(parts) == 2 # or print an error message ...
key, value = [part.strip('" ') for part in parts]
assert key not in d # or print an error message ...
d[key] = value
Bonus extra: You get to check for dodgy lines and duplicate keys.
Upvotes: 0
Reputation: 229754
In Python, multiple for
loops in a list comprehension are handled from left to right, not from right to left, so your last expression should read:
[re.sub(r'"', '', elem) for line in f_lines for elem in line]
It doesn't lead to an error as it is, since list comprehensions leak the loop variable, so line
is still in scope from the previous expression. If that line
then is an empty string you get an empty list as result.
Upvotes: 2