Karlovsky120
Karlovsky120

Reputation: 6352

PDF Handling in Java

I have created a program that should one day become a PDF editor

It's purpose will be saving GUI's textual content to the PDF, and loading it from it. GUI resembles text editor, but it only has certain fields(JTextAreas, actually).

It can look like this (this is only one page, it can have many more, also upper and lower margins are cut out of the picture) It should actually resemble A4 in pixel size.

GUI picture

I have looked around for a bit for PDF libraries and found out that iText could suit my PDF creating needs, however, if I understood it correct, it retirevs text from a whole page as a string which won't work for me, because I will need to detect diferent fields/paragaphs/orsomething to be able to load them back into the program.

Now, I'm a bit lazy, but I don't want to spend hours going trough numerus PDF libraries just to find out that they won't work for me.

Instead, I'm asking someone with a bit more Java PDF handling experience to recommend me one according to my needs.

Or maybe recommend me how to add invisible parts to PDF which will help my program to determine where is it exactly situated insied a PDF file...

Just to be clear (I formed my question wrong before), only thing I need to put in my PDF is text, and that's all I need to later be able to get out. My program should be able to read PDF's which he created himself...

Also, because of the designated use of files created with this program, they need to be in the PDF format.

Upvotes: 1

Views: 1505

Answers (3)

yms
yms

Reputation: 10418

Assuming that your application will only consume PDF files generated by the same application, there is one part of the PDF specification called Marked Content, that was introduced precisely for this purpose. Using Marked Content you can specify the structure of the text in your document (chapter, paragraph, etc). Read Chapter 14 - Document Interchange of the PDF Reference Document for more details.

Upvotes: 1

Vivin Paliath
Vivin Paliath

Reputation: 95518

Short Answer: Use an intermediate format like JSON or XML.

Long Answer: You're using PDF's in a manner that they wasn't designed for. PDF's were not designed to store data; they were designed to present and format data in an portable form. Furthermore, a PDF is a very "heavy" way to store data. I suggest storing your data in another manner, perhaps in a format like JSON or XML.

The advantage now is that you are not tied to a specific output-format like PDF. This can come in handy later on if you decide that you want to export your data into another format (like a Word document, or an image) because you now have a common representation.

I found this link and another link that provides examples that show you how to store and read back metadata in your PDF. This might be what you're looking for, but again, I don't recommend it.

Upvotes: 2

gigadot
gigadot

Reputation: 8969

If you really insist on using PDF to store data, I suggest that you store the actual data in either XML or RDF and then attach that to the PDF file when you generate it. Then you can read the XML back for the data.

Upvotes: 1

Related Questions