Charles Saag
Charles Saag

Reputation: 631

How do I print non-ASCII characters in a file using python3?

Here's an example of my code. It is very simple as you'll see. When I use it to print a file from an Ubuntu terminal window, I get the following error message:

Traceback (most recent call last):
  File "/ascii_cat", line 22, in <module>
    print_file_in_ascii(f)
  File "/ascii_cat", line 16, in print_file_in_ascii
    for line in f:
  File "/usr/lib/python3.4/codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Code:

#!/usr/bin/python3

import sys

def contains_only_ascii(a_string):
    try:
        for a_char in a_string.strip():
            if ord(a_char) < 32 or ord(a_char) > 126:
                return False
    except:
        pass
    return True

def print_file_in_ascii(fname):
    with open(fname, "r") as f:
        for line in f:
            if contains_only_ascii(line) == True:
                print(line, end="")

# sys.argv may be multiple files when a * is using for a filename; globbing
for f in sys.argv[1:]:
    print_file_in_ascii(f)

Upvotes: 0

Views: 173

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 178179

You've opened the file with the default encoding, which on your system is utf-8. The file is not encoded in UTF-8, so reading the file produces the exception.

Open the file in the correct encoding by specifying an encoding= parameter explicitly:

with open(fname,encoding='whatever_the_encoding_really_is') as f:

Upvotes: 2

Related Questions