mfg_2018
mfg_2018

Reputation: 65

Reading file to string (python)

I just installed Anaconda to a Windows 10 machine (Python 2.7.12 |Anaconda 4.2.0 (64-bit)|) I am having an issue reading text from a file. Please see code and output below. I want the actual text from the file.

Thanks!!

Output:

 ['\xff\xfeT\x00h\x00i\x00s\x00',
  '\x00i\x00s\x00',
   '\x00a\x00',
   '\x00t\x00e\x00s\x00t\x00.\x00',
   '\x00',
   '\x00',
   '\x00',
   '\x00T\x00h\x00i\x00s\x00',
   '\x00i\x00s\x00',
   '\x00a\x00',
   '\x00t\x00e\x00s\x00t\x00']

Code:

try:    
    with open('test.txt', 'r') as f:        
        text = f.read()
except Exception as e:
    print e
    print text.split()

test.txt:

This is a test.

This is a test

Upvotes: 2

Views: 493

Answers (2)

DYZ
DYZ

Reputation: 57033

You have an issue with the text encoding. You file is not encoded in UTF-8, but in UTF-16. Instead of using open, use:

import codecs
with codecs.open("test.txt", "r", encoding="utf-16") as f:
    text = f.read()

Or switch to Python3 that has a much better support for unicode.

Upvotes: 0

miah
miah

Reputation: 10433

I've had the best luck with using the io module to open the file with an explicit encoding.

import io
with io.open(FILE, 'r', encoding='utf-16') as f:
    job = f.read()

Upvotes: 2

Related Questions