Polymath
Polymath

Reputation: 125

Decoding KeyNote IWA protobuf data with Python

Good afternoon,

I am looking for a bit of insight into working with KeyNote files (~2017 ver 8.x).

My objective is fairly basic. I just want to extract the text and images from about 3000 KeyNote files. I am working in Python 2.7 due to the age of many of the tools, but I would like to upgrade to 3.x or 4.x eventually. Despite a lot of reading and experimenting I seen to have hit a wall extracting messages from the IWA objects.

I have been experimenting with various approaches and have also been trying to manually deconstruct the IWA files by hand using the protobuf encoding information. However something just does not add up. Testing with messages created using the Protobuf sample code I can deconstruct 100%, but .IWA blocks from KeyNote files end up with invalid wire types, repeat field numbers or field sizes that don't makes sense (e.g. larger that the size of the IWA block).

What I think I know.

1/ The .key files are a grouping of objects that are zipped and can be unzipped using a generic module like zipfile. Once unzipped, the key file can be separated giving access to the/index branch and constituant IWA objects.

2/ The IWA files have a 4 byte little endian header, and the rest should follow the google protobuf encoding.

3/ The protobuf encoding does hold for some aspects of the IWA files. e.g recognized blocks of text have the correct tags. However other parts of the IWA does not seem to follow the rules either resulting in invalid wire-type codes (e.g. wire-type=6 ) or, field numbers are zero or are reused.

What I would appreciate is if:

A/ Someone could confirm that the KeyNote encoding does comply with the Google protobuf encoding, or point me at a valid encoding schedule or scheme that I can use.

B/ Someone could clarify if the IAW objects are or are not individually compressed in addition to the compressing applied to the whole .key file. The documentation is unclear, but my attempts to further decompress the IWA objects was not successful.

C/ Someone could direct me to a functional Python library that can extract data from KeyNote files. As much as I am having fun playing with file deconstruction at the byte and bit level, I still have an objective to achieve :-)

Thank you.

Rusty

Any insights gratefully accepted

Upvotes: 0

Views: 790

Answers (1)

cholm
cholm

Reputation: 481

I know this is a relatively old question, but I came across it and would offer up some information.

The page

https://github.com/obriensp/iWorkFileFormat/blob/master/Docs/index.md#iwa

seems to have a lot of info on the format. In particular, it seems (from what I gather from that page) that the IWA does not follow exactly the ProtoBuf encoding, which is probably the cause of your problems with invalid wire numbers and non-sensable field lengths.

Upvotes: 0

Related Questions