BTG123
BTG123

Reputation: 147

Not able to read file due to unicode error in python

I'm trying to read a file and when I'm reading it, I'm getting a unicode error.

def reading_File(self,text):

     url_text =  "Text1.txt"
     with open(url_text) as f:
                content = f.read()

Error:

content = f.read()# Read the whole file
 File "/home/soft/anaconda/lib/python3.6/encodings/ascii.py", line 26, in 
 decode
 return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 404: 
ordinal not in range(128)

Why is this happening? I'm trying to run the same on Linux system, but on Windows it runs properly.

Upvotes: 2

Views: 9992

Answers (4)

snakecharmerb
snakecharmerb

Reputation: 55599

According to the question,

i'm trying to run the same on Linux system, but on Windows it runs properly.

Since we know from the question and some of the other answers that the file's contents are neither ASCII nor UTF-8, it's a reasonable guess that the file is encoded with one of the 8-bit encodings common on Windows.

As it happens 0x92 maps to the character 'RIGHT SINGLE QUOTATION MARK' in the cp125* encodings, used on US and latin/European regions.

So probably the the file should be opened like this:

# Python3
with open(url_text, encoding='cp1252') as f:
    content = f.read()

# Python2
import codecs
with codecs.open(url_text, encoding='cp1252') as f:
    content = f.read()

Upvotes: 3

Stop harming Monica
Stop harming Monica

Reputation: 12590

There can be two reasons for that to happen:

  1. The file contains text encoded with an encoding different than 'ascii' and, according you your comments to other answers, 'utf-8'.

  2. The file doesn't contain text at all, it is binary data.

In case 1 you need to figure out how the text was encoded and use that encoding to open the file:

open(url_text, encoding=your_encoding)

In case 2 you need to open the file in binary mode:

open(url_text, 'rb')

Upvotes: 1

napuzba
napuzba

Reputation: 6288

You can use codecs.open to fix this issue with the correct encoding:

import codecs
with codecs.open(filename, 'r', 'utf8' ) as ff:
    content = ff.read()

Upvotes: 0

Bijendra
Bijendra

Reputation: 10015

As it looks, default encoding is ascii while Python3 it's utf-8, below syntax to open the file can be used

open(file, encoding='utf-8')

Check your system default encoding,

>>> import sys
>>> sys.stdout.encoding
'UTF-8'

If it's not UTF-8, reset the encoding of your system.

 export LANGUAGE=en_US.UTF-8
 export LC_ALL=en_US.UTF-8
 export LANG=en_US.UTF-8
 export LC_TYPE=en_US.UTF-8

Upvotes: 0

Related Questions