bigTree
bigTree

Reputation: 2173

Encoding error on Python

I am executing the following code on Python:

from csv import reader, writer


def my_function(file1, file2, output, xs, stringL = 'k', delim = ','):

    with open(file1, 'r') as text, open(file2, 'r') as src, open(output, 'w') as dst:
        for l in text:
            for x in xs:
                if stringL in l:
                    print("found!")

        my_reader = reader(src, delimiter = delim)
        my_writer = writer(dst, delimiter = delim)

        columnNumber = 0
        for column in zip(*my_reader):
            print(column, columnNumber)
            columnNumber += 1


if __name__ == '__main__':
        from sys import argv
    if len(argv) == 5:
        my_function(argv[1], argv[2], argv[3], argv[4])
    elif len(argv) == 6:
        my_function(argv[1], argv[2], argv[3], argv[4], argv[5])
    elif len(argv) == 7:
        my_function(argv[1], argv[2], argv[3], argv[4], argv[5], argv[6])
    else:
        print("Invalid number of arguments")
    print("Done")

file1 is a text file like:

a
k
k
a
k
k
a
a
a
z

a
a
a

file2 is any csv file

I encounter the error:

  File "error.py", line 16, in my_function
  for column in zip(*my_reader):
  File "/usr/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xde in position 12: invalid continuation byte

I found the same error here with a solution to it. However, I have trouble adapting this solution to my code... I tried several things like

column = unicode(column, errors = 'replace')

but it still doesn't work.

Could you please help me?

Upvotes: 0

Views: 1670

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121914

Python 3 opens text files by default as UTF-8 to decode to Unicode values. Your inputfile is not UTF-8 however, and decoding fails.

It is impossible to deduce from the error message or your post what the correct encoding is, but you need to find out and specify it when opening the file:

open(file2, 'r', encoding='*correct encoding for file2*', newline='') as src

Note the newline='' as well; see the csv.reader() documentation.

Your sys.argv handling is overly verbose, just use:

if __name__ == '__main__':
    from sys import argv
    if 5 <= len(argv) <=7:
        my_function(*argv[1:])
    else:
        print("Invalid number of arguments")
    print("Done")

Upvotes: 1

Related Questions