Reputation: 73
I have written two functions in Python. When I run replace(), it looks at the data structure named replacements. It takes the key, iterates through the document and when it matches a key to a word in the document, it replaces the word with the value.
Now it seems what is happening, because i also have the reverse ('stopped' changes to 'suspended' and 'suspended' changes to 'stopped', depending on what is in the text file), it seems that as it goes through the file, some words are changed, and then changed back (i.e so no changes are made)
when I run replace2() i take each word from the text document, and see if this is a key in replacements. If it is, I replace it. What I have noticed though, when I run this, suspended (contains the substring "ended") ends up as "suspfinished"?
Is there an easier way to iterate through the text file and only change the word once, if found? I think replace2() does what I want it to do, although I'm losing phrases, but it also seems to pick up substrings, which it should not, as i did use the split() function.
def replace():
fileinput = open('tennis.txt').read()
out = open('tennis.txt', 'w')
for i in replacements.keys():
fileinput = fileinput.replace(i, replacements[i])
print(i, " : ", replacements[i])
out.write(fileinput)
out.close
def replace2():
fileinput = open('tennis.txt').read()
out = open('tennis.txt', 'w')
#for line in fileinput:
for word in fileinput.split():
for i in replacements.keys():
print(i)
if word == i:
fileinput = fileinput.replace(word, replacements[i])
out.write(fileinput)
out.close
replacements = {
'suspended' : 'stopped',
'stopped' : 'suspended',
'due to' : 'because of',
'ended' : 'finished',
'finished' : 'ended',
'40' : 'forty',
'forty' : '40',
'because of' : 'due to' }
the match ended due to rain a mere 40 minutes after it started. it was suspended because of rain.
Upvotes: 0
Views: 113
Reputation: 98
Improved version of rawbeans answer. It didn't work as expected since some of your replacement keys contain multiple words.
Tested with your example line and it outputs: the match finished because of rain a mere forty minutes after it started. it was stopped due to rain.
import re
def replace2():
fileinput = open('tennis.txt').read()
out = open('tennisout.txt', 'w')
#for line in fileinput:
wordpats = '|'.join(replacements.keys())
pattern = r'({0}|\w+|\W|[.,!?;-_])'.format(wordpats)
words = re.findall(pattern, fileinput)
output = "".join(replacements.get(x, x) for x in words)
out.write(output)
out.close()
replacements = {
'suspended' : 'stopped',
'stopped' : 'suspended',
'due to' : 'because of',
'ended' : 'finished',
'finished' : 'ended',
'40' : 'forty',
'forty' : '40',
'because of' : 'due to' }
if __name__ == '__main__':
replace2()
Upvotes: 1
Reputation: 2549
To account for punctuation, use a regular expression instead of split()
:
output = " ".join(replacements.get(x, x) for x in re.findall(r"[\w']+|[.,!?;]", fileinput))
out.write(output)
This way, punctuation will be ignored during the replace, but will be present in the final string. See this post for an explanation and potential caveats.
Upvotes: 0
Reputation: 44474
is there an easier way to iterate through the text file and only change the word once, if found?
There's a much simpler way:
output = " ".join(replacements.get(x, x) for x in fileinput.split())
out.write(output)
Upvotes: 0