Ketty K.
Ketty K.

Reputation: 13

PDFBox incorrect text appearance after copy/paste

I’m using PDFBox 2.0.4 to create PDF documents with acroForms. Here is my test code example:

PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);

PDAcroForm acroForm = new PDAcroForm(document);
document.getDocumentCatalog().setAcroForm(acroForm);

String dir = "../testPdfBox/src/main/resources/fonts/";
PDType0Font font = PDType0Font.load(document, new File(dir + "Roboto-Regular.ttf"));

PDResources resources = new PDResources();
String fontName = resources.add(font).getName();
acroForm.setDefaultResources(resources);

String defaultAppearanceString = format("/%s 12 Tf 0 g", fontName);
acroForm.setDefaultAppearance(defaultAppearanceString);

PDTextField field = new PDTextField(acroForm);
field.setPartialName("SampleField");
field.setDefaultAppearance(defaultAppearanceString);
acroForm.getFields().add(field);

PDAnnotationWidget widget = field.getWidgets().get(0);
PDRectangle rect = new PDRectangle(50, 750, 200, 50);
widget.setRectangle(rect);
widget.setPage(page);
widget.setPrinted(true);

page.getAnnotations().add(widget);

field.setValue("Sample field 123456");

acroForm.flatten();

document.save("target/SimpleForm.pdf");
document.close();

Everything works fine. But when I try to copy text from the created document and paste it to the NotePad or Word it becomes squares.

􀀷􀁅􀁑􀁔􀁐􀁉􀀄􀁊􀁍􀁉􀁐􀁈􀀄􀀕􀀖􀀗􀀘􀀙􀀚

I search a lot about this problem. The most popular answer is that there is no toUnicode cmap in created PDF. So I explore my document with CanOpener for Acrobat:

enter image description here

Yes, there is no toUnicode cmap, but everything works properly, if not to use acroForm.flatten(). When form fields are not flattened, I can copy/paste text from the document and it looks correct. Nevertheless I need all fields to be flattened.

So, I have two questions:

  1. Why there is a problem with copy/pasting text in flattened form, and everything is ok in non-flattened?

  2. What can I do to avoid problem with text copy/pasting? Is there only one solution - to create toUnicode CMap by my own, like in this example?

My test pdf files are available here.

Upvotes: 1

Views: 507

Answers (1)

Tilman Hausherr
Tilman Hausherr

Reputation: 18851

Please replace

PDType0Font font = PDType0Font.load(document, new File(dir + "Roboto-Regular.ttf"));

with

PDType0Font font = PDType0Font.load(document, new FileInputStream(dir + "Roboto-Regular.ttf"), false);

This makes sure that the font is embedded in full and not just as a subset.

Upvotes: 1

Related Questions