comiventor
comiventor

Reputation: 4122

using PDFLib/TET alongwith boto to parse pdf on amazon s3

  1. read pdf file from amazon s3 (using boto)
  2. save it locally as 123.pdf
  3. open and parse the locally saved pdf using PDFLib/TET

I am currently able to perform all 3 steps above but I want to skip the step 2 to save on Disk I/O.

It looks like one can use tet_open_document_mem to make TET open the document in memory but there is no documentation available on how one can use one.

Upvotes: 0

Views: 385

Answers (1)

rjs
rjs

Reputation: 86

TET offers the so called PDFlib Virtual Filesystem (PVF) to handle such a situation.

You may use create_pvf() to create a named virtual read-only file from data provided in memory.

The API looks like this (C):

void TET_create_pvf(TET *tet, const char *filename, int len, const void *data, size_t size, const char *optlist)

So it might be used like this:

TET_create_pvf(tet, pvfname, 0, data, length, "");
doc = TET_open_document(tet, pvfname, 0, docoptlist);

More details can be found in the TET-Manual http://www.pdflib.com/fileadmin/pdflib/pdf/manuals/TET-4.3-manual.pdf

TET_open_document_mem is an old API which is no longer supported.

Upvotes: 1

Related Questions