TJSoler
TJSoler

Reputation: 145

PDF Creating Server

I've been tasked to create (or seek something that is already working) a centralized server with an API that has the ability to return a PDF file passing some data, and the name of the template, it has to be a robust solution, enterprise ready. The goal is as follows:

So, I've been reading. One way we've thought it may be possible is installing a jasper reports server, and creating the templates in Jaspersoft Studio, then using the API to return the PDF files. A colleague stands for this option, because it's mostly done, but 1º is java and 2º I think it's like using a hammer to crack a nut.

Other option we've been toying with is to use C# with iTextSharp to create a server, and create our own API that returns exactly the PDF with the data we need. Doing this we could have some benefits, like using the database connector we have already made and extracting most of the data from the database, instead of having to pass around a big chunk of data, but as it is bare, it doesn't really have a templating system. We'd have create something from with the XMLWorker or with c# classes but it's not really "easy" as drag and drop. For this case I've been reading about XFA too, but documentation on the iText site is misleading and not clear.

I've been also reading about some other alternatives, like PrinceXML, PDFBox, FOP, etc, but the concept will be the same as iText, we'd have to do it ourselves.

My vote, even if it's more work is to go the route of iText and use HTML / CSS for the templates, but my colleagues claim that the templates should be able to be changed every other week (I doubt it), and be easy. HTML / CSS would be too much work.

So the real question is, how do other business approach this? Did I leave anything out on my search? Is there an easier way to achieve this?

PS: I didn't know if SO would be the correct place for this question, but I'm mostly lost and risking a "too broad question" or "off topic" tag doesn't seem that bad.

EDIT:

Extra data: Right now we are using "Fast-Reports v2 VCL"

Upvotes: 1

Views: 299

Answers (2)

Eugene
Eugene

Reputation: 2878

From my years of experience in working with PDF I think you should pay attention to the following points:

  1. The performance: You may do the fastest performance with API based pdf files generation in comparision to HTML or XML to PDF generation (because of an additional layer of conversion involved). Considering peaks in the load you may want to calculate the cost of scaling up the generation by adding more servers (and estimate the cost of additional servers or resources required per additional pdf file per day).

  2. Ease of iterations and changes: how often will you need to adjust templates? If you are going to create templates just once (with some iterations) but then no changes required then you should be OK by just coding them using the API. Otherwise you should strongly consider using HTML or XML for templates to simplify changes and to decrease the complexity of making changes in templates;

  3. Search and indexing: If you may need to run search among created documents then you should consider storing indexes of documents generated or maybe store more the source data in XML along with PDF file generated;
  4. Long time preservation: you should better conform to PDF/A sub-format in case you are looking for a long time digital preservation for your documents. See the VeraPDF open source initiative that you may use to validate generated and incoming PDF documents against the conformance to PDF/A requirements;
  5. Preserving source files The PDF format itself was not designed to be edited (though there are some PDF editors already) so you may consider the need of preserving the source data to be able to regenerate PDF documents later and probably introduce additional output formats later.

Upvotes: 0

Paul Jowett
Paul Jowett

Reputation: 6581

Your question shows you've been considering the problem in detail before asking for help so I'm sure SO will be friendly.

Certainly one thing you haven't detailed much in your description is the broader functional requirements. You mentioned cracking a nut with a hammer, but I think you are focused mostly on the technology/interfacing. If you consider your broader requirements for the documents you need to create, the variables involved, it's might be a bigger nut that you think.

The approach I would suggest is to prototype solutions, assuming you have some room to do so. From your research, pick maybe the best 3 to try which may well include the custom build you have in mind. Put them through some real use-cases end to end - rough as possible but realistic. One or two key documents you need to output should be used across all solutions. Make sure you are covering the most important or most common requirements in terms of:

  1. Input Format(s) - who can/should be updating templates. What is the ideal requirement and what is the minimum requirement? Output Requirement(s) - who are you delivering to and what formats are essential/desirable
  2. Data Requirement(s) - what are your sources of data and how hard/easy is it to get data from your sources to the reporting system in the format needed?
  3. Template feature(s) - if you are using templates, what features do the templates need? This includes input format(s) but I was mostly thinking of features of the engine like repeating/conditional content, image insertion, table manipulation etc. ie are your invoices, orders and planning documents plain or complex
  4. API requirements - do you have any broader API requirements. You mentioned you use PHP so a PHP library or Web/Web Service is likely to be a good starting point.
  5. Performance - you haven't mentioned any performance characteristics but certainly if you are working at scale (enterprise) it would be worth even rough-measuring the throughput.

iText and Jasper are certainly enterprise grade engines you can rely on. You may wish to look at Docmosis (please note I work for the company) and probably do some searches for PDF libraries that use templates.

A web service interface is possibly a key feature you might want to look at. A REST API is easy to call from PHP and virtually any technology stack. It means you will likely have options about how you can architect a solution, and it's typically easy to prototype against. If you decide to go down the prototyping path and try Docmosis, start with the cloud service since you can prototype/integrate very quickly.

I hope that helps.

Upvotes: 1

Related Questions