UnicodeDecodeError in reading a file

Question

When I read a whole file, my script works fine without a problem

fst = 0
with open(in_ckfile, 'rb', 0) as file:
    with open(outfile_namepath, mode='wb') as outfile:
        while True:
            #buf = file.read(204800)
            buf = file.read()
                    
            if buf: 
                fst += 1
                print('read no., len of buf ......: ', fst, len(buf))

                buf = buf.decode()
                xbytes = bytearray()
                xbytes.extend(map(ord, buf))  
                buf = xbytes

                print('read no., len of decode buf: ', fst, len(buf))

And, the result of the process is as shown below::

read no., len of buf ......:  1 26848013
read no., len of decode buf:  1 18546777
len of in string ..........:  18546777
len of output str, checked :  18546777 370130

However, when I divide the reading by units as: buf = file.read(204800) it gives an error:

read no., len of buf ......:  1 204800
read no., len of decode buf:  1 141406
len of in string ..........:  141406
len of output str, checked :  141406 2827 

read no., len of buf ......:  2 204800
read no., len of decode buf:  2 141606
len of in string ..........:  141606
len of output str, checked :  141606 2800 

read no., len of buf ......:  3 204800
Traceback (most recent call last):
  File "", line 1, in 
  ...
  buf = buf.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 204799: unexpected end of data

How do I fix the issue

UnicodeDecodeError in reading a file

Answers (1)

Related Questions