Reputation: 545
I wrote a code to remove all parentheses in a txt file, and the text between them, as well as multiple whitespace.
However, I have very little experience with Python, and it's quite obvious that my code is inefficient.
What's the best way to do what I want?
import re
lines = open('test.txt', 'r+')
lines = [re.sub('\s+',' ', line) for line in lines] #this is to kill 'tab' whitespaces
lines = [re.sub(' +',' ', line) for line in lines] #regular whitespace, if more than 1
lines = [re.sub('\(.*?\)','', line) for line in lines] #brackets and the text
with open('test2.txt', 'w') as out:
out.writelines(lines)
Upvotes: 0
Views: 127
Reputation: 1000
If you have enough lines to offset the cost of compiling the regexes, something like the following should serve.
#!/usr/bin/env python
import re
if __name__ == "__main__":
lines = {' foo (bar) '}
parens_regex = re.compile(r'\(.*?\)') # Non-greedy
space_regex = re.compile(r'\s+')
for line in lines:
print 'Before: "%s"' % line
line_tmp = parens_regex.sub('', line) # Before space-regex so we also collapse space around parens
line_tmp = space_regex.sub(' ', line_tmp)
line_tmp = line_tmp.strip()
print 'After: "%s"' % line_tmp # Prints: "foo"
I guess it's questionable whether that's more elegant - probably not.
You already knew enough about regexes to make your parens regex non-greedy.
But maybe a future Stack Overflow reader doesn't. Or maybe they or you didn't know about compiling regexes...
Upvotes: 1