Dinakar
Dinakar

Reputation: 329

Replace Words in a line from a file in python

I am trying to replace certain words in file which has mutliplelines. Below is the code I wrote. Please note I am still learning python.

ParsedUnFormattedFile = io.open("test.txt", "r", encoding="utf-8", closefd=True).read()

remArticles = {' a ':'', ' the ':'', ' and ':'', ' an ':''}

for line in ParsedUnFormattedFile:
    for i in remArticles.keys():
           words = line.split()
           ParsedReplacementFile = ParsedUnFormattedFile.replace(words,remArticles[i])

    FormattedFileForIndexing =  io.open("FormattedFileForIndexing.txt", "w", encoding="utf-8", closefd=True)
    FormattedFileForIndexing.write(ParsedReplacementFile)

If I am replacing by directly reading a line, it replaces only 1 word out of all words. It's usually 'the' in my system.

So I wanted to split and look for ever word and then replace it. However I get below error:

line 14, in <module>
    ParsedReplacementFile = ParsedUnFormattedFile.replace(words,remArticles[i])
TypeError: coercing to Unicode: need string or buffer, list found

How can I get this rectified?

Thanks

Upvotes: 1

Views: 1566

Answers (2)

Aran-Fey
Aran-Fey

Reputation: 43246

There are a number of problems.

  1. ParsedUnFormattedFile is a string, not a file, because you've called .read(). That means your for line in ParsedUnFormattedFile loop does not iterate through the lines in the file, but the individual characters.
  2. Each time the for i in remArticles.keys(): loop runs, a new value is assigned to ParsedReplacementFile. It will only retain the last one.
  3. You're overwriting the file FormattedFileForIndexing.txt in each iteration of your for line in ParsedUnFormattedFile: loop.

It's probably best to redo everything from scratch.

remArticles = {' a ':'', ' the ':'', ' and ':'', ' an ':''}

with io.open("test.txt", "r", encoding="utf-8") as ParsedUnFormattedFile:
    with io.open("FormattedFileForIndexing.txt", "w", encoding="utf-8") as FormattedFileForIndexing:
        for line in ParsedUnFormattedFile:
            for i in remArticles:
                line= line.replace(i, remArticles[i])
            FormattedFileForIndexing.write(line)

Upvotes: 1

Adam Hughes
Adam Hughes

Reputation: 16309

When you call split(), you return a list.

'a b c asd sas'.split()
['a', 'b', 'c', 'asd', 'sas']

Instead, replace before you split, or concat the list back into a string and then replace. To concatenate as list to a string:

words = ''.join(words)

EG:

''.join(['a','b','c'])
>>> 'abc'

Upvotes: 1

Related Questions