ebeg
ebeg

Reputation: 428

Why pdf contain one field only is around 500Kb

Here you can download pdf with one acroform field and his size is exactly 427Kb

If I remove this unique field, file is 3Kb only, why this happens please ? I tried analyse using PDF Debugger and nothing seems weird to me.

enter image description here

Upvotes: 0

Views: 557

Answers (1)

Tilman Hausherr
Tilman Hausherr

Reputation: 18861

There's an embedded "Arial" font in the acroform default resources, see Root/AcroForm/DR/Font/Arial/FontDescriptor/FontFile2.

Either you or whoever created the pdf added it for no reason. The font is not used / referenced. For the acroform default resources you could check the /DA entry (default appearance) of each field whether it contains the font name.

When you removed the field somehow you also removed the font from the acroForm default resources. (You didn't write how you removed it)

Here's some code to do it (null checks mostly missing):

    PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
    PDResources defaultResources = acroForm.getDefaultResources();
    COSDictionary fontDict = (COSDictionary) defaultResources.getCOSObject().getDictionaryObject(COSName.FONT);
    List<String> defaultAppearances = new ArrayList<>();
    List<COSName> fontDeletionList = new ArrayList<>();
    for (PDField field : acroForm.getFieldTree())
    {
        if (field instanceof PDVariableText)
        {
            PDVariableText vtField = (PDVariableText) field;
            defaultAppearances.add(vtField.getDefaultAppearance());
        }
    }
    for (COSName fontName : defaultResources.getFontNames())
    {
        if (COSName.HELV.equals(fontName) || COSName.ZA_DB.equals(fontName))
        {
            // Adobe default, always keep
            continue;
        }
        boolean found = false;
        for (String da : defaultAppearances)
        {
            if (da != null && da.contains("/" + fontName.getName()))
            {
                found = true;
                break;
            }
        }
        System.out.println(fontName + ": " + found);
        if (!found)
        {
            fontDeletionList.add(fontName);
        }
    }
    System.out.println("deletion list: " + fontDeletionList);
    for (COSName fontName : fontDeletionList)
    {
        fontDict.removeItem(fontName);
    }

The resulting file has 5KB size now.

I haven't checked the annotations. Some of them have also a /DA string but it is unclear if the acroform default resources fonts are to be used when reconstructing a missing appearance stream.

Update: Here's some additional code to replace Arial with Helv:

for (PDField field : acroForm.getFieldTree())
{
    if (field instanceof PDVariableText)
    {
        PDVariableText vtField = (PDVariableText) field;
        String defaultAppearance = vtField.getDefaultAppearance();
        if (defaultAppearance.startsWith("/Arial"))
        {
            vtField.setDefaultAppearance("/Helv " + defaultAppearance.substring(7));
            vtField.getWidgets().get(0).setAppearance(null); // this removes the font usage
            vtField.setValue(vtField.getValueAsString());
        }
        defaultAppearances.add(vtField.getDefaultAppearance());
    }
}

Note that this may not be a good idea, because the standard 14 fonts have only limited characters. Try

vtField.setValue("Ayşe");

and you'll get an exception.

More general code to replace font can be found in this answer.

Upvotes: 2

Related Questions