Text replacement doesn't work in special cases

Question

I have a word list file, named Words.txt, which contains hundreds of words, and a few subtitle files (.srt). I would like to go through all subtitle files, and search them for all of the words in the word list file. If a word is found, I'd like to change it's color to green. This is the code:

import fileinput
import os
import re

wordsPath = 'C:/Users/John/Desktop/Subs/Words.txt'
subsPath = 'C:/Users/John/Desktop/Subs/Season1'
wordList = []

wordFile = open(wordsPath, 'r')
for line in wordFile:
    line = line.strip()
    wordList.append(line)

for word in wordList:
    for root, dirs, files in os.walk(subsPath, topdown=False):
        for fileName in files:
            if fileName.endswith(".srt"):
                with open(fileName, 'r') as file :
                    filedata = file.read()
                    filedata = filedata.replace(' '  +word+  ' ', ' ' + '' +word+'' + ' ')
                with open(fileName, 'w') as file:
                    file.write(filedata)

Say the word "book" is in the list and is found in one of the subtitle files. As long as this word is in the sentence like "This book is amazing", my code works perfectly fine. However, when the word is mentioned like "BOOK", "Book", and when it is at the begging or at the end of a sentence, the code fails. How can I solve this problem?

Dani Mesejo · Accepted Answer

You are using str.replace, from the documentation:

Return a copy of the string with all occurrences of substring old replaced by new

Here an occurrence means an exact match of the string old, then the function will try to replace a word surrounded by whitespaces, for example ' book ' that is different than ' BOOK ', ' Book ' and ' book'. Let's see a few cases that also don't match:

" book " == " BOOK "  # False
" book " == " book"  # False
" book " == " Book "  # False
" book " == " bOok " # False
" book " == "   book " # False

One alternative is to use a regex like this:

import re

words = ["book", "rule"]
sentences = ["This book is amazing", "The not so good book", "OMG what a great BOOK", "One Book to rule them all",
             "Just book."]

patterns = [re.compile(r"\b({})\b".format(word), re.IGNORECASE | re.UNICODE) for word in words]
replacements = ['' + word + '' for word in words]

for sentence in sentences:

    result = sentence[:]
    for pattern, replacement in zip(patterns, replacements):
        result = pattern.sub(r'\1', result)
    print(result)

Output

This book is amazing
The not so good book
OMG what a great BOOK
One Book to rule them all
Just book.

Text replacement doesn't work in special cases

Answers (1)

Related Questions

Text replacement doesn&#39;t work in special cases

Answers (1)

Related Questions

Text replacement doesn't work in special cases