pbecker13
pbecker13

Reputation: 83

Python UnicodeDecodeError

I am writing a Python program to read in a DOS tree command outputted into a text document. When I reach the 533th iteration of the loop, Eclipse gives an error:

Traceback (most recent call last):
  File "E:\Peter\Documents\Eclipse Workspace\MusicManagement\InputTest.py", line 24, in  <module>
    input = myfile.readline()
  File "C:\Python33\lib\encodings\cp1252.py", line 23, in decode
   return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3551: character maps  to undefined

I have read other posts, and setting the encoding to latin-1 does not resolve this issue, as it returns a UnicodeDecodeError on another character, and the same with trying to use utf-8.

The following is the code:

import os
from Album import *

os.system("tree F:\\Music > tree.txt")

myfile = open('tree.txt')
myfile.readline()
myfile.readline()
myfile.readline()

albums = []
x = 0

while x < 533:
    if not input: break
    input = myfile.readline()
    if len(input) < 14:
        artist = input[4:-1]
    elif input[13] != '-':
        artist = input[4:-1]
    else:
        albums.append(Album(artist, input[15:-1], input[8:12]))
    x += 1

for x in albums:
    print(x.artist + ' - ' + x.title + ' (' + str(x.year) + ')')

Upvotes: 8

Views: 7788

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1125148

You need to figure out what encoding tree.com used; according to this post that could any of the MS-DOS codepages.

You could go through each of the MS-DOS encodings; most of those have a codec in the python standard library. I'd try cp437 and cp500 first; the latter is the MS-DOS predecessor of cp1252 I think.

Pass the encoding to open():

myfile = open('tree.txt', encoding='cp437')

You really should look into using os.walk() instead of using tree.com for this task though, it'll save you having to deal with issues like these at least.

Upvotes: 9

Emanuele Paolini
Emanuele Paolini

Reputation: 10172

In this line:

myfile = open('tree.txt')

you should specify the encoding of your file. On windows try:

myfile = open('tree.txt',encoding='cp1250')

Upvotes: 1

Related Questions