user1031551
user1031551

Reputation: 73

Python search and replace

I have written two functions in Python. When I run replace(), it looks at the data structure named replacements. It takes the key, iterates through the document and when it matches a key to a word in the document, it replaces the word with the value.

Now it seems what is happening, because i also have the reverse ('stopped' changes to 'suspended' and 'suspended' changes to 'stopped', depending on what is in the text file), it seems that as it goes through the file, some words are changed, and then changed back (i.e so no changes are made)

when I run replace2() i take each word from the text document, and see if this is a key in replacements. If it is, I replace it. What I have noticed though, when I run this, suspended (contains the substring "ended") ends up as "suspfinished"?

Is there an easier way to iterate through the text file and only change the word once, if found? I think replace2() does what I want it to do, although I'm losing phrases, but it also seems to pick up substrings, which it should not, as i did use the split() function.

def replace():
        fileinput = open('tennis.txt').read()
        out = open('tennis.txt', 'w')
        for i in replacements.keys():
            fileinput = fileinput.replace(i, replacements[i])
            print(i, " : ", replacements[i])
        out.write(fileinput)
        out.close


def replace2():
        fileinput = open('tennis.txt').read()
        out = open('tennis.txt', 'w')
        #for line in fileinput:
        for word in fileinput.split():
            for i in replacements.keys():
                print(i)
                if word == i:
                    fileinput = fileinput.replace(word, replacements[i])
        out.write(fileinput)
        out.close

replacements = {
    'suspended'    : 'stopped',
    'stopped'      : 'suspended',
    'due to'       : 'because of',
    'ended'        : 'finished',
    'finished'     : 'ended',
    '40'           : 'forty',
    'forty'        : '40',
    'because of'   : 'due to' }

the match ended due to rain a mere 40 minutes after it started. it was suspended because of rain.

Upvotes: 0

Views: 113

Answers (3)

Çağlar Kutlu
Çağlar Kutlu

Reputation: 98

Improved version of rawbeans answer. It didn't work as expected since some of your replacement keys contain multiple words.

Tested with your example line and it outputs: the match finished because of rain a mere forty minutes after it started. it was stopped due to rain.

import re

def replace2():
    fileinput = open('tennis.txt').read()
    out = open('tennisout.txt', 'w')
    #for line in fileinput:

    wordpats = '|'.join(replacements.keys())
    pattern = r'({0}|\w+|\W|[.,!?;-_])'.format(wordpats)
    words = re.findall(pattern, fileinput)
    output = "".join(replacements.get(x, x) for x in words)
    out.write(output)
    out.close()


replacements = {
    'suspended'    : 'stopped',
    'stopped'      : 'suspended',
    'due to'       : 'because of',
    'ended'        : 'finished',
    'finished'     : 'ended',
    '40'           : 'forty',
    'forty'        : '40',
    'because of'   : 'due to' }


if __name__ == '__main__':
    replace2()

Upvotes: 1

roob
roob

Reputation: 2549

To account for punctuation, use a regular expression instead of split():

output = " ".join(replacements.get(x, x) for x in re.findall(r"[\w']+|[.,!?;]", fileinput))
out.write(output)

This way, punctuation will be ignored during the replace, but will be present in the final string. See this post for an explanation and potential caveats.

Upvotes: 0

UltraInstinct
UltraInstinct

Reputation: 44474

is there an easier way to iterate through the text file and only change the word once, if found?

There's a much simpler way:

output = " ".join(replacements.get(x, x) for x in fileinput.split())
out.write(output)

Upvotes: 0

Related Questions