Reputation: 329
I am trying to replace certain words in file which has mutliplelines. Below is the code I wrote. Please note I am still learning python.
ParsedUnFormattedFile = io.open("test.txt", "r", encoding="utf-8", closefd=True).read()
remArticles = {' a ':'', ' the ':'', ' and ':'', ' an ':''}
for line in ParsedUnFormattedFile:
for i in remArticles.keys():
words = line.split()
ParsedReplacementFile = ParsedUnFormattedFile.replace(words,remArticles[i])
FormattedFileForIndexing = io.open("FormattedFileForIndexing.txt", "w", encoding="utf-8", closefd=True)
FormattedFileForIndexing.write(ParsedReplacementFile)
If I am replacing by directly reading a line, it replaces only 1 word out of all words. It's usually 'the' in my system.
So I wanted to split and look for ever word and then replace it. However I get below error:
line 14, in <module>
ParsedReplacementFile = ParsedUnFormattedFile.replace(words,remArticles[i])
TypeError: coercing to Unicode: need string or buffer, list found
How can I get this rectified?
Thanks
Upvotes: 1
Views: 1566
Reputation: 43246
There are a number of problems.
ParsedUnFormattedFile
is a string, not a file, because you've called .read()
. That means your for line in ParsedUnFormattedFile
loop does not iterate through the lines in the file, but the individual characters.for i in remArticles.keys():
loop runs, a new value is assigned to ParsedReplacementFile
. It will only retain the last one.FormattedFileForIndexing.txt
in each iteration of your for line in ParsedUnFormattedFile:
loop.It's probably best to redo everything from scratch.
remArticles = {' a ':'', ' the ':'', ' and ':'', ' an ':''}
with io.open("test.txt", "r", encoding="utf-8") as ParsedUnFormattedFile:
with io.open("FormattedFileForIndexing.txt", "w", encoding="utf-8") as FormattedFileForIndexing:
for line in ParsedUnFormattedFile:
for i in remArticles:
line= line.replace(i, remArticles[i])
FormattedFileForIndexing.write(line)
Upvotes: 1
Reputation: 16309
When you call split()
, you return a list.
'a b c asd sas'.split()
['a', 'b', 'c', 'asd', 'sas']
Instead, replace before you split, or concat the list back into a string and then replace. To concatenate as list to a string:
words = ''.join(words)
EG:
''.join(['a','b','c'])
>>> 'abc'
Upvotes: 1