BrtH
BrtH

Reputation: 2664

Data encoding in python

I download an image from newsgroups with the nntplib module in python. I then want to save the data to the file. I use:

news.group('alt.binaries.misc')  
data=''.join(news.body('<DhTgplpHcRsZMBTTw3i35@spot.net>')[-1])  
f=open('image.png','wb')  
f.write(data)  
f.close()

However, the saved file isn't a proper image file.
data is a string of the form:

'\x89PNG=B=C\x1a=C=A=A=A=BIHDR=A=A\x02X=A=A\x01Q\x08\x06=A=A=A\xa8\x81\xd3\x89=A=A=A\tpHYs=A=A\x0b\x13=A=A\x0b\x13\x01=A\x9a\x9c\x18=A=A etc... '

, with a lenght of 309530.
I can tell from the first bytes that the file should be a png file and the size also seems good to me, so I assume that the data is correct.
Does anyone knows what i'm doing wrong?

UPDATE:
I looked in the header of the article and its says: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit I don't think this is very helpful with decoding the text, but who knows..

I also compared my string with regular headers of png files. This is \x89PNG\r\n\x1a\n, or \x89PNG\x0d\x0a\x1a\x0a. (as alexis also stated)
I concluded that =B stands for \x0d, =C for \x0a and =A for \x00. I assume that the other \x..'s are not encoded, but i'm not sure (I don't know very much about encodings) update3 shows that they do differ.
What is an encoding that encodes this way?

UPDATE2: the data: -see below- (repr(data))

UPDATE3: I was able to save the image with another program and then to open it in python. This is what the data should be. -see below-. The beginnings look kind of similar, but after that there is a big difference. What the hell is this encoding? it really frustrates me. (BTW, thanks for all the great help so far)


All the files: http://dl.dropbox.com/u/1499291/python-encoding-question/index.html

Upvotes: 1

Views: 1637

Answers (3)

Maestro
Maestro

Reputation: 9518

NULL is replaced with "=A", CR with "=B", LF with "=C", and "=" with "=D", very similar to gZip 8bit ([http://www.imc.org/ietf-usefor/2003/Feb/0575.html][1]).

Upvotes: 1

J&#246;rg Beyer
J&#246;rg Beyer

Reputation: 3671

looks like the data is uuencoded, so you have to decode it, before you write it to your image file.

you can verify this, when you uudecode image.png the file you have written with your above script.

Python has support for this in its uu module.

Upvotes: 1

alexis
alexis

Reputation: 50220

I don't believe NNTP.body is supposed to automatically decode article content, or is it? Have you looked at the article source? It should specify the encoding.

Anyway the PNG signature should start with \x89PNG, but should then be followed by 0d 0a (CR LF). This ain't it. Could it be base64 encoded, or some such? All those = signs look very familiar.

Upvotes: 0

Related Questions