Dark Star1
Dark Star1

Reputation: 7393

Is it possible to use apache POI transformation in alfresco?

I am trying to use the apache poi transformer in alfresco to transform an excel file to HTML without success so far.

In <Project-home>/src/main/amp/config/alfresco/extension/subsystems/Transformers/default/default/transformers.properties

   content.transformer.Poi.priority=70
   content.transformer.Poi.extensions.xlsx.html.supported=true

I then set the log4j.logger.org.alfresco.repo.content.transform.TransformerDebug=TRACE and log4j.logger.org.alfresco.util.exec.RuntimeExec=TRACE but I see in the logs that the transformer is not called. on the transformations to excel.

EDIT: The Mimetypes webscript (GET /alfresco/s/mimetypes?mimetype={mimetype?}) returns

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet - xlsx
Extractors: org.alfresco.repo.content.metadata.PoiMetadataExtracter
Transformable To:

    application/eps = Complex via: application/pdf
    application/pdf = Using a Direct Open Office Connection
    application/vnd.ms-excel = Using a Direct Open Office Connection
    application/vnd.oasis.opendocument.spreadsheet = Using a Direct Open Office Connection
    application/vnd.oasis.opendocument.spreadsheet-template = Using a Direct Open Office Connection
    application/vnd.sun.xml.calc = Using a Direct Open Office Connection
    application/vnd.sun.xml.calc.template = Using a Direct Open Office Connection
    application/xhtml+xml = org.alfresco.repo.content.transform.TikaAutoContentTransformer
    image/bmp = Complex via: application/pdf
    image/cgm = Complex via: application/pdf
    image/gif = Complex via: application/pdf
    image/ief = Complex via: application/pdf
    image/jp2 = Complex via: application/pdf
    image/jpeg = org.alfresco.repo.content.transform.OOXMLThumbnailContentTransformer
    image/png = Complex via: application/pdf
    image/tiff = Complex via: application/pdf
    image/vnd.adobe.photoshop = Complex via: application/pdf
    image/vnd.adobe.premiere = Complex via: application/pdf
    image/x-cmu-raster = Complex via: application/pdf
    image/x-dwt = Complex via: application/pdf
    image/x-portable-anymap = Complex via: application/pdf
    image/x-portable-bitmap = Complex via: application/pdf
    image/x-portable-graymap = Complex via: application/pdf
    image/x-portable-pixmap = Complex via: application/pdf
    image/x-raw-adobe = Complex via: image/jpeg
    image/x-raw-canon = Complex via: image/jpeg
    image/x-raw-fuji = Complex via: image/jpeg
    image/x-raw-hasselblad = Complex via: image/jpeg
    image/x-raw-kodak = Complex via: image/jpeg
    image/x-raw-leica = Complex via: image/jpeg
    image/x-raw-minolta = Complex via: image/jpeg
    image/x-raw-nikon = Complex via: image/jpeg
    image/x-raw-olympus = Complex via: image/jpeg
    image/x-raw-panasonic = Complex via: image/jpeg
    image/x-raw-pentax = Complex via: image/jpeg
    image/x-raw-red = Complex via: image/jpeg
    image/x-raw-sigma = Complex via: image/jpeg
    image/x-raw-sony = Complex via: image/jpeg
    image/x-xbitmap = Complex via: application/pdf
    image/x-xpixmap = Complex via: application/pdf
    image/x-xwindowdump = Complex via: application/pdf
    text/html = org.alfresco.repo.content.transform.PoiHssfContentTransformer
    text/plain = org.alfresco.repo.content.transform.TikaAutoContentTransformer
    text/xml = org.alfresco.repo.content.transform.TikaAutoContentTransformer

Showing the transformer

Upvotes: 2

Views: 607

Answers (1)

Dark Star1
Dark Star1

Reputation: 7393

I solved the issue by creating a complex transformation pipeline of the path XLSX => PDF => HTML. I used coolwanglu's html2pdfEX which can be a bit tricky to install so use this script for installation on ubuntu and don't bother with installation on CentOS < 7 as there is an issue with python. As for the extension:
src/main/amp/config/alfresco/extension/subsystems/Transformers/default/default/transformers.properties

#increase the maximum defaults allowed size
content.transformer.OpenOffice.extensions.xlsx.pdf.maxSourceSizeKBytes=5120

#disable ootb pdf->html and xlsx->html transformation path (Apparently has no effect)
content.transformer.OpenOffice.extensions.xlsx.html.supported=false
content.transformer.complex.OpenOffice.PdfBox.extensions.*.html.available=false
content.transformer.complex.OpenOffice.PdfBox.extensions.*.html.supported=false

#PDF to html transformer
content.transformer.pdf2htmlex.available=true
#content.transformer.pdf2htmlex.thresholdCount=5
#content.transformer.default.timeoutMs=180000
content.transformer.pdf2htmlex.priority=50
content.transformer.pdf2htmlex.extensions.pdf.html.supported=true
content.transformer.pdf2htmlex.extensions.pdf.html.priority=50
content.transformer.pdf2htmlex.extensions.pdf.html.maxSourceSizeKBytes=9999


#XLSX to HTML pipeline
content.transformer.complex.Xlsx.Html.pipeline=*|pdf|*
content.transformer.complex.Xlsx.Html.available=true
content.transformer.complex.Xlsx.Html.extensions.xlsx.html.priority=30
content.transformer.complex.Xlsx.Html.extensions.xlsx.html.supported=true

The transformer bean:

<bean id="transformer.worker.pdf2htmlex"
      class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">
    <property name="mimetypeService">
        <ref bean="mimetypeService"/>
    </property>
    <property name="checkCommand">
        <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandsAndArguments">
                <map>
                    <entry key=".*">
                        <list>
                            <value>pdf2htmlEX</value>
                            <value>-v</value>
                        </list>
                    </entry>
                </map>
            </property>
            <!--<property name="errorCodes">
                <value>1</value>
            </property>-->
        </bean>
    </property>
    <property name="transformCommand">
        <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandsAndArguments">
                <map>
                    <entry key=".*">
                        <list>
                            <value>pdf2htmlEX</value>
                            <value>--embed</value>
                            <value>CFIJO</value>
                            <value>${source}</value>
                            <value>${target}</value>
                        </list>
                    </entry>
                </map>
            </property>
            <property name="processDirectory" value="/"/>
        </bean>
    </property>
    <property name="explicitTransformations">
        <list>
            <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
                <constructor-arg>
                    <value>application/pdf</value>
                </constructor-arg>
                <constructor-arg>
                    <value>text/html</value>
                </constructor-arg>
            </bean>
        </list>
    </property>
</bean>

<bean id="transformer.pdf2htmlex" class="org.alfresco.repo.content.transform.ProxyContentTransformer"
      init-method="register"
      parent="baseContentTransformer">
    <property name="worker" ref="transformer.worker.pdf2htmlex"/>
    <!--The next two were added this because of the line at
    https://github.com/Alfresco/community-edition/blob/afde3f58f91567b6f7eaa0bbac5e5adc38087fe0/projects/repository/
    source/java/org/alfresco/repo/content/transform/AbstractContentTransformer2.java#L135 due to getting the
    following error on startup:
    Cannot create dynamic transformer transformer.complex.Xlsx.Html as sub transformers could not be found or
    created ("*|pdf|pdf2htmlex"). Incidentally it had no effect as the transformer properties need to be in the form
    *|pdf|*; but just in case this changes with future release of alfresco we leave this here and we are able to register custom transformers with the contentTransformerRegistry on startup.
    -->
    <property name="registry" ref="contentTransformerRegistry"/>
    <property name="registerTransformer" value="true"/>
</bean>

Upvotes: 2

Related Questions