Oonah
Oonah

Reputation: 73

Reading data from a VSD (Windows Visio Binary) File in Python (Linux) with OLE Tools is very unclear, is there any other way to extract the data?

I am trying to read the contents of a Visio Binary .VSD file which contains information from a graph I have made.

I have tried using the OLE Tools and OLEFile but cannot correctly read the contents. I can view the file with the OLETools. When I dump the contents and view it with the 'xxd' command (in terminal) i can't clearly see the text that I saved within the file. There is a lot of extra \x00, \xff etc. and other characters within the file, which when removed make it worse. I've done the exact same with a .doc file and I have been able to open and clearly read the contents.

Can anyone please point me in the correct direction if I am doing this wrong or rather in the direction of other tools that work fine?

Upvotes: 1

Views: 3601

Answers (2)

Oonah
Oonah

Reputation: 73

Thanks for all the help.

I have found a way to extract plain text from the file and convert it to XHTML and parse that. The main problem is that now I loose any structure the original document may have had.

The tools are libvisio-tools https://launchpad.net/ubuntu/trusty/+package/libvisio-tools

Installing gives you the following programs vsd2xtml, vsd2raw, vsd2text which can be run from terminal to convert the files

Upvotes: 1

Nikolay
Nikolay

Reputation: 12245

You have really picked a strong enemy :)

Unlike other office apps Visio .vsd binary file format is not exactly Microsoft's "compound document", that's basically just a wrapper. The format was created by Visio Corp back in 199x, and AFAIK was never actually publicly documented.

I would really recommend you NOT to go with binary .VSD if possible. Latest Visio supports standard openxml format (.vsdx) which is just a bunch of zipped xml files basically.

AFAIK the only known third-party library to understand binary .vsd is aspose diagrams, but it's not free.

Upvotes: 1

Related Questions