Reputation: 11

Decoding and decompressing AI9_DataStream within .eps files

Context: I am attempting to automate the inspection of eps files to detect a list of attributes, such as whether the file contains locked layers, embedded bitmap images etc.

So far we have found some of these things can be detected via inspection of the raw eps file data and its accompanying metadata (similar to the information returned by imagemagick.) However it seems that in files created by illustrator 9 and above the vast majority of this information is encoded within the "AI9_DataStream" portion of the file. This data is encoded via ascii85 and compressed. We have found some success in getting at this data by using: https://github.com/huandu/node-ascii85 to decode and nodes zlib library to decompress / unzip. (Our project is written in node / javascript). However it seems that in roughly half of our test cases / files the unzipping portion fails, throwing Z_DATA_ERROR / "incorrect data check".

Our method responsible for trying to decode:

export const decode = eps =>
   new Promise((resolve, reject) => {
     const lineDelimiters = /\r\n%|\r%|\n%/g;
     const internal = eps.match(
       /(%AI9_DataStream)([\s\S]*?)(AI9_PrivateDataEnd)/
     );
     const hasDataStream = internal && internal.length >= 2;

     if (!hasDataStream) resolve('');

     const encoded = internal[2].replace(lineDelimiters, '');
     const decoded = ascii85.decode(encoded);

     try {
       zlib.unzip(decoded, (err, buffer) => {
         // files can crash this process, for now we need to allow it
         if (err) resolve('');
         else resolve(buffer.toString('utf8'));
       });
     } catch (err) {
       reject(err);
     }
   });

I am wondering if anyone out there has had any experience with this issue and has some insight into what might be causing this and whether there is an alternative avenue to explore for reliably decoding this data. Information on this topic seems a bit sparse so really anything that could get us going in the right direction would be very much appreciated.

Note: The buffers produced by the ascii85 decoding all have the same 78 9c header which should indicate standard zlib compression (and it does in fact decompress into parsable data about half the time without error)

Upvotes: 0

Answers (2)

Hub

Reputation: 11

Apparently we were misreading something about the ascii85 encoding. There is a ~> delimiter at the end of the encoded block that needs to be omitted from the string before decoding and subsequent unzipping.

So instead of:

/(%AI9_DataStream)([\s\S]*?)(AI9_PrivateDataEnd)/

Use:

/(%AI9_DataStream)([\s\S]*?)(~>)/

And you can get to the correct encoded / compressed data. So far this has produced human readable / regexable data in all of our current test cases so unless we are thrown another curve that seems to be the answer.

Upvotes: 1

KenS

Reputation: 31199

The only reliable method for getting content from PostScript is to run it through a PostScript interpreter, because PostScript is a programming language.

If you stick to a specific workflow with well understood input, then you may have some success in simple parsing, but that's about the only likely scenario which will work.

Note that EPS files don't have 'layers' and certainly don't have 'locked' layers.

You haven't actually pointed to a working example, but I suspect the content of the AI9_DataStream is not relevant to the EPS. Its probably a means for Illustrator to include its own native file format inside the EPS file, without it affecting a PostScript interpreter. This is how it works with AI-produced PDF files.

This means that when you reopen the EPS file with Adobe Illustrator, it ignores the EPS and uses the embedded native file, which magically grants you the ability to edit the file, including features like layers which cannot be represented in the EPS.

Upvotes: 0

Decoding and decompressing AI9_DataStream within .eps files

Answers (2)

Related Questions