oliver
oliver

Reputation: 1

XML file doesn't get attached into the PDF while using ghostscript

My goal is to covert a pdf into a file that fits the factur-x format.

I successfully converted a pdf into pdfA/3-b Here's the code:

import subprocess

gs_path = r"C:\Program Files\gs\gs10.02.1\bin\gswin64.exe"

def convert_to_pdfa(input_path, output_path, pdfa_def_path):
    command = [
        gs_path,
        "-dPDFA=3",
        "-dBATCH",
        "-dNOPAUSE",
        "-sColorConversionStrategy=UseDeviceIndependentColor",
        "-sDEVICE=pdfwrite",
        "-sOutputFile=" + output_path,
        "-dPDFACompatibilityPolicy=2",
        pdfa_def_path,
        input_path
    ]

    subprocess.run(command)

if __name__ == "__main__":
    input_pdf_path = "facture.pdf"
    output_pdfa_path = "output_pdfa.pdf"
    pdfa_def_path = "PDFA_def.ps"

    convert_to_pdfa(input_pdf_path, output_pdfa_path, pdfa_def_path)

Here's the code in the PDFA_def.ps file:

% Define entries in the document Info dictionary :

/ICCProfile (sRGB_v4_ICC_preference.icc)
def

[ /Title (test)
/DOCINFO pdfmark

% Define an ICC profile :

[/_objdef {icc_PDFA} /type /stream /OBJ pdfmark
[{icc_PDFA} <</N systemdict /ProcessColorModel get /DeviceGray eq {1} {4} ifelse >> /PUT pdfmark
[{icc_PDFA} ICCProfile (r) file /PUT pdfmark

% Define the output intent dictionary :

[/_objdef {OutputIntent_PDFA} /type /dict /OBJ pdfmark
[{OutputIntent_PDFA} <<
/Type /OutputIntent             % Must be so (the standard requires).
/S /GTS_PDFA1                   % Must be so (the standard requires).
/DestOutputProfile {icc_PDFA}            % Must be so (see above).
/OutputConditionIdentifier (sRGBv4 ICC preference)

/PUT pdfmark

% Embed XML file:
[ /_objdef {InvoiceStream} /type /stream /OBJ pdfmark
[ {InvoiceStream} << /Type /EmbeddedFile /Subtype (text/xml) cvn /Params << /ModDate (D:20130121081433+01’00’) >> >> /PUT pdfmark
[ {InvoiceStream} (output.xml) (r) file /PUT pdfmark
[ {InvoiceStream} /CLOSE pdfmark
[ /_objdef {Invoice_FSDict} /type /dict /OBJ pdfmark
[ {Invoice_FSDict} << /Type /FileSpec /F (output.xml) /UF (output.xml) /Desc (ZUGFeRD XML invoice) /AFRelationship /Alternative /EF << /F {InvoiceStream} /UF {InvoiceStream} >> >> /PUT pdfmark
[ /_objdef {AFArray} /type /array /OBJ pdfmark
[ {AFArray} {FSDict} /APPEND pdfmark
[ {Catalog} << /AF {AFArray} >> /PUT pdfmark
[ /Name (output.xml) /FS {FSDict} /EMBED pdfmark
[
/XML
(
...
)
/Ext_Metadata pdfmark

I followed this tutorial on the zugferd blog

When I open the pdf, there's no attached xml file: There is no xml files attached

I compared the pdf I rendered with a pdf that follows the factur-x format

the pdf I rendered:

46 0 obj
<</Type/Metadata
/Subtype/XML/Length 1294>>stream
<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
<?adobe-xap-filters esc="CRLF"?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='XMP toolkit 2.9.1-13, framework 1.6'>
...
</x:xmpmeta>                                                                                                                                      
<?xpacket end='w'?>
endstream
endobj

valid pdf:

8 0 obj
<<
/Filter /FlateDecode
/Subtype /XML
/Type /Metadata
/Length 978
>>
stream
    ... binary data ...
endstream
endobj

Upvotes: 0

Views: 592

Answers (3)

Bojan Hrnkas
Bojan Hrnkas

Reputation: 1694

For anyone having problem producing correct output, be sure to use correct version of zugferd.ps. Do not download it somewhere from internet, but take the one provided with your installed version of ghostscript.

On windows it is under %programfiles%\gs\gs<version>\lib

Also take rgb profile from %programfiles%\gs\gs<version>\iccprofiles

Upvotes: 0

K J
K J

Reputation: 11739

For Windows users struggling to ensure the syntax is working for them use this as a template command then adapt slowly until finally working when you can add the -q (if desired).

You need from the installed GS files a copy of

  • zugferd.ps
  • rgb.icc and or cmyk.icc (Ensure it is the correct type for your needs)
  • a pair of source.pdf and source.xml (here I call them invoice-0001)

The result should be invoice-0001-xml.pdf and the stated size 0 bytes as not a pdf.

enter image description here

gswin##c --permit-file-read="%CD%/" -sDEVICE=pdfwrite -dPDFA=3 -sColorConversionStrategy=RGB -sZUGFeRDProfile="%CD%\rgb.icc" -sZUGFeRDVersion=2p1 -sZUGFeRDConformanceLevel=BASIC -sZUGFeRDXMLFile="%CD%\invoice-0001.xml" -o"%CD%\invoice-0001-xml.pdf" zugferd.ps "%CD%\invoice-0001.pdf"

NOTES

  • gswin##c Will be the correct installed .exe for your system or user environmental paths where ## is either 32 or 64
  • "%CD%/" The Current work Directory where all the InOut files are suggested to be together (while testing as you can replace %CD% after testing) and beware only for the permissions it MUST be forward slash terminated !
  • If using CMYK colour inks then the RGB and rbg values both need the correct CMYK.icc profile

enter image description here

Once you trust a zero Length file it can run anything suitable.

enter image description here

So the file will usually run as a File in Edge.

enter image description here

Upvotes: 1

Hermann12
Hermann12

Reputation: 3417

I see not that your subprocess followed the command.ine description of ghostwriter -> here:

gs --permit-file-read=/usr/home/me/zugferd/ -sDEVICE=pdfwrite -dPDFA=3\
-sColorConversionStrategy=RGB -sZUGFeRDXMLFile=/usr/home/me/zugferd/invoice.xml\
-sZUGFeRDProfile=/usr/home/me/rgb.icc -sZUGFeRDVersion=2p1 -sZUGFeRDConformanceLevel=BASIC\
-o /usr/home/me/zugferd/zugferd.pdf\
/usr/home/me/zugferd/zugferd.ps /usr/home/me/zugferd/original.pdf

There are also factur-x python libraries on PyPi.

Upvotes: 0

Related Questions