jjcyalater
jjcyalater

Reputation: 59

Read an image from file using ascii encoding

I've been having trouble loading images from a file as a string. Many of the functions that I need to use in my program rely on the read data being encoded with ascii and it simply fails to handle the data I give it producing the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa8 in position 14: ordinal not in range(128)

So how would I go about converting this data to ascii.

EDIT:

Here is my admittedly messy code I am using. Please do not comment about how messy it is, this is a rough draft:

def text_to_bits(text, encoding='utf-8', errors='surrogatepass'):
    bits = bin(int(binascii.hexlify(text.encode(encoding, errors)), 16))[2:]
    return bits.zfill(8 * ((len(bits) + 7) // 8))

def str2int(string):
    binary = text_to_bits(string)
    number = int(binary, 2)
    return number

def go():
    #filen is the name of the file
    global filen
    #Reading the file
    content = str(open(filen, "r").read())
    #Using A function from above
    integer = str2int(content)
    #Write back to the file
    w = open(filen, "w").write(str(integer))

Upvotes: 0

Views: 4038

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124090

Image data is not ASCII. Image data is binary, and thus uses bytes that the ASCII standard doesn't cover. Don't try to decode the data as ASCII. You also want to make sure you open your file in binary mode, to avoid platform-specific line separator translations, something that'll damage your image data.

Any method expecting to handle image data will deal with binary data, and in Python 2 that means you'll be handling that as the str type.

In your specific case, you are using a function that expects to work on Unicode data, not binary image data, and it is trying to encode that data to binary. In other words, because you are you are giving it data that is already binary (encoded), the function applies a conversion method for Unicode (to produce a binary representation) on data that is already binary. Python then tries to decode first to give you Unicode to encode. It is that implicit decoding that fails here:

>>> '\xa8'.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa8 in position 0: ordinal not in range(128)

Note that I encoded, but got a decoding exception.

The code you are using is extremely convoluted. If you wanted to interpret the whole binary contents of a file as one large integer, you could do it by converting to a hex representation, but then you'd not convert to a binary string and back to an integer again. The following would suffice:

with open(filename, 'rb') as fileobj:
    binary_contents = fileobj.read()
    integer_value = int(binascii.hexlify(binary_contents), 16)

Image data is not unually interpreted as one long number however. Binary data can encode integers, but when processing images, you'd usually do so using the struct module to decode specific integer values from specific bytes instead.

Upvotes: 2

Related Questions