Jay Gattuso
Jay Gattuso

Reputation: 4130

Python - convert a raw binary dump into ASCII HEX bytes

Further to this question: Handling and working with binary data HEX with python (and thanks to awesome pointers I received) I'm stuck on one last aspect of tool.

I am basically writing a cleaner for files that I have with data past the EOF marker. This extra data means they fail some validation tools. I need to strip the extra data, so they be presented to the validator, however I don't want to throw this data away (in fact I have to keep it...)

I've written an XML container to hold the data, and a few other provenance/audit type values, but I'm (still) stuck on elegantly moving between raw binary and something I can "bake" in to a file.

example:

A jpg file ends with (hex editor view) 96 1a 9c fd ab 4f 9e 69 27 ad fd da 0a db 76 bb ee d2 6a fd ff 00 ff d9 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

The EOF marker for jpg is ff d9, so the cleaner works backwards through the file until its a match against the EOF marker. In this case it would create a new jpg file stopping at the ff d9 and then attempt to write the stripped data to the XML (via the elementTree lib): changeString.text =str(excessData)

Of course this wont work as the XML writer is looking to write ASCII not binary dumps.

In the above case, the error is UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128) which I can see if because its not a valid ASCII character

My question therefore, is how do I elegantly deal with this raw data, in a way it can stored and used in the future? (I plan to write an 'uncleaner' next that can take the clean file and the XML and reconstruct the original file...)

______EDIT_______

Using the suggestions from below, this is the traceback:

Traceback (most recent call last):
  File "C:\...\EOF_cleaner\scripts\test6.py", line 87, in <module> main()
  File "C:\...\EOF_cleaner\scripts\test6.py", line 73, in main splitFile(f_data, offset)
  File "C:\...EOF_cleaner\scripts\test6.py", line 60, in splitFile makeXML(excessData)
  File "C:\...\EOF_cleaner\scripts\test6.py", line 53 in makeXML ET.ElementTree(root).write(noteFile)
  File "c:\python27\lib\xml\etree\ElementTree.py", line 815, in write serialize(write, self._root, encoding, qnames, namespaces)
  File "c:\python27\lib\xml\etree\ElementTree.py", line 934, in _serialize_xml_serialize_xml(write, e, encoding, qnames, None)
  File "c:\python27\lib\xml\etree\ElementTree.py", line 934, in _serialize_xml_serialize_xml(write, e, encoding, qnames, None)
  File "c:\python27\lib\xml\etree\ElementTree.py", line 934, in _serialize_xml_serialize_xml(write, e, encoding, qnames, None)
  File "c:\python27\lib\xml\etree\ElementTree.py", line 932, in _serialize_xml write(_escape_cdata(text, encoding))
  File "c:\python27\lib\xml\etree\ElementTree.py", line 1068, in _escape_cdata  return text.encode(encoding, "xmlcharrefreplace")
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

The line that throws things is changeString.text = excessData.encode('base64') (line 45) and ET.ElementTree(root).write(noteFile) (line 53)

Upvotes: 0

Views: 8268

Answers (2)

Greg Hewgill
Greg Hewgill

Reputation: 993085

To convert raw bytes to space-separated ASCII hex, you can use something like:

>>> a = "abc\x01\x02"
>>> print(" ".join("{:02x}".format(x) for x in a))
61 62 63 01 02

However, as mentioned in other answers, something like Base64 is probably going to be more efficient and easier to work with.

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121864

Use Base64:

excessData.encode('base64')

It'll be easy to turn that back to binary data later on with a simple .decode('base64') call.

Base64 encodes to ASCII data safe for inclusion in XML, in a reasonably compact format; every 3 bytes of binary information become 4 Base64 characters.

Upvotes: 4

Related Questions