Reputation: 65
I just installed Anaconda to a Windows 10 machine (Python 2.7.12 |Anaconda 4.2.0 (64-bit)|) I am having an issue reading text from a file. Please see code and output below. I want the actual text from the file.
Thanks!!
Output:
['\xff\xfeT\x00h\x00i\x00s\x00',
'\x00i\x00s\x00',
'\x00a\x00',
'\x00t\x00e\x00s\x00t\x00.\x00',
'\x00',
'\x00',
'\x00',
'\x00T\x00h\x00i\x00s\x00',
'\x00i\x00s\x00',
'\x00a\x00',
'\x00t\x00e\x00s\x00t\x00']
Code:
try:
with open('test.txt', 'r') as f:
text = f.read()
except Exception as e:
print e
print text.split()
test.txt:
This is a test.
This is a test
Upvotes: 2
Views: 493
Reputation: 57033
You have an issue with the text encoding. You file is not encoded in UTF-8, but in UTF-16. Instead of using open, use:
import codecs
with codecs.open("test.txt", "r", encoding="utf-16") as f:
text = f.read()
Or switch to Python3 that has a much better support for unicode.
Upvotes: 0
Reputation: 10433
I've had the best luck with using the io
module to open the file with an explicit encoding.
import io
with io.open(FILE, 'r', encoding='utf-16') as f:
job = f.read()
Upvotes: 2