Reputation: 492
I have some lines in a text document I am trying to replace/remove. The document is in the ISO-8859-1 character encoding.
When I try to copy this line into my Python script to replace it, it won't match. If I shorten the line and remove up until the first double quotation mark " it will replace it fine.
i.e.
desc = [x.replace('Random text “^char”:', '') for x in desc]
This will not match. If I enter:
desc = [x.replace('Random text :', '') for x in desc]
It matches fine. I have checked that it isn't the ^ symbol as well. Clearly Python IDLE is not using the same character set as my text file and is changing the symbol when I paste it into the script. So how do I get my script to look for this line if it doesn't handle the same characters?
Upvotes: 1
Views: 101
Reputation: 55499
Unfortunately, there's no sure-fire way to determine the encoding of a plain text document, although there are packages that can make very good guesses by analyzing the contents of the document. One popular 3rd-party module for encoding detection is chardet. Or you could manually use trial and error with some popular encodings and see what works.
Once you've determined the correct encoding, the replacement operation itself is simple in Python 3. The core idea is to pass the encoding to the open
function, so that you can write Unicode string objects to the file, or read Unicode string objects from the file. Here's a short demo. This will work correctly if the encoding of your terminal is set to UTF-8. I've tested it on Python 3.6.0, both in the Bash shell and in idle3.6.
fname = 'test.txt'
encoding = 'cp1252'
data = 'This is some Random text “^char”: for testing\n'
print(data)
# Save the text to file
with open(fname, 'w', encoding=encoding) as f:
f.write(data)
# Read it back in
with open(fname, 'r', encoding=encoding) as f:
text = f.read()
print(text, text == data)
# Perform the replacement
target = 'Random text “^char”:'
out = text.replace(target, 'XXX')
print(out)
output
This is some Random text “^char”: for testing
This is some Random text “^char”: for testing
True
This is some XXX for testing
Upvotes: 1