Reputation: 7393
I am trying to use the apache poi transformer in alfresco to transform an excel file to HTML without success so far.
In <Project-home>/src/main/amp/config/alfresco/extension/subsystems/Transformers/default/default/transformers.properties
content.transformer.Poi.priority=70
content.transformer.Poi.extensions.xlsx.html.supported=true
I then set the log4j.logger.org.alfresco.repo.content.transform.TransformerDebug=TRACE
and log4j.logger.org.alfresco.util.exec.RuntimeExec=TRACE
but I see in the logs that the transformer is not called. on the transformations to excel.
EDIT:
The Mimetypes webscript (GET /alfresco/s/mimetypes?mimetype={mimetype?}
) returns
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet - xlsx
Extractors: org.alfresco.repo.content.metadata.PoiMetadataExtracter
Transformable To:
application/eps = Complex via: application/pdf
application/pdf = Using a Direct Open Office Connection
application/vnd.ms-excel = Using a Direct Open Office Connection
application/vnd.oasis.opendocument.spreadsheet = Using a Direct Open Office Connection
application/vnd.oasis.opendocument.spreadsheet-template = Using a Direct Open Office Connection
application/vnd.sun.xml.calc = Using a Direct Open Office Connection
application/vnd.sun.xml.calc.template = Using a Direct Open Office Connection
application/xhtml+xml = org.alfresco.repo.content.transform.TikaAutoContentTransformer
image/bmp = Complex via: application/pdf
image/cgm = Complex via: application/pdf
image/gif = Complex via: application/pdf
image/ief = Complex via: application/pdf
image/jp2 = Complex via: application/pdf
image/jpeg = org.alfresco.repo.content.transform.OOXMLThumbnailContentTransformer
image/png = Complex via: application/pdf
image/tiff = Complex via: application/pdf
image/vnd.adobe.photoshop = Complex via: application/pdf
image/vnd.adobe.premiere = Complex via: application/pdf
image/x-cmu-raster = Complex via: application/pdf
image/x-dwt = Complex via: application/pdf
image/x-portable-anymap = Complex via: application/pdf
image/x-portable-bitmap = Complex via: application/pdf
image/x-portable-graymap = Complex via: application/pdf
image/x-portable-pixmap = Complex via: application/pdf
image/x-raw-adobe = Complex via: image/jpeg
image/x-raw-canon = Complex via: image/jpeg
image/x-raw-fuji = Complex via: image/jpeg
image/x-raw-hasselblad = Complex via: image/jpeg
image/x-raw-kodak = Complex via: image/jpeg
image/x-raw-leica = Complex via: image/jpeg
image/x-raw-minolta = Complex via: image/jpeg
image/x-raw-nikon = Complex via: image/jpeg
image/x-raw-olympus = Complex via: image/jpeg
image/x-raw-panasonic = Complex via: image/jpeg
image/x-raw-pentax = Complex via: image/jpeg
image/x-raw-red = Complex via: image/jpeg
image/x-raw-sigma = Complex via: image/jpeg
image/x-raw-sony = Complex via: image/jpeg
image/x-xbitmap = Complex via: application/pdf
image/x-xpixmap = Complex via: application/pdf
image/x-xwindowdump = Complex via: application/pdf
text/html = org.alfresco.repo.content.transform.PoiHssfContentTransformer
text/plain = org.alfresco.repo.content.transform.TikaAutoContentTransformer
text/xml = org.alfresco.repo.content.transform.TikaAutoContentTransformer
Showing the transformer
Upvotes: 2
Views: 607
Reputation: 7393
I solved the issue by creating a complex transformation pipeline of the path XLSX => PDF => HTML. I used coolwanglu's html2pdfEX which can be a bit tricky to install so use this script for installation on ubuntu and don't bother with installation on CentOS < 7 as there is an issue with python.
As for the extension:
src/main/amp/config/alfresco/extension/subsystems/Transformers/default/default/transformers.properties
#increase the maximum defaults allowed size
content.transformer.OpenOffice.extensions.xlsx.pdf.maxSourceSizeKBytes=5120
#disable ootb pdf->html and xlsx->html transformation path (Apparently has no effect)
content.transformer.OpenOffice.extensions.xlsx.html.supported=false
content.transformer.complex.OpenOffice.PdfBox.extensions.*.html.available=false
content.transformer.complex.OpenOffice.PdfBox.extensions.*.html.supported=false
#PDF to html transformer
content.transformer.pdf2htmlex.available=true
#content.transformer.pdf2htmlex.thresholdCount=5
#content.transformer.default.timeoutMs=180000
content.transformer.pdf2htmlex.priority=50
content.transformer.pdf2htmlex.extensions.pdf.html.supported=true
content.transformer.pdf2htmlex.extensions.pdf.html.priority=50
content.transformer.pdf2htmlex.extensions.pdf.html.maxSourceSizeKBytes=9999
#XLSX to HTML pipeline
content.transformer.complex.Xlsx.Html.pipeline=*|pdf|*
content.transformer.complex.Xlsx.Html.available=true
content.transformer.complex.Xlsx.Html.extensions.xlsx.html.priority=30
content.transformer.complex.Xlsx.Html.extensions.xlsx.html.supported=true
The transformer bean:
<bean id="transformer.worker.pdf2htmlex"
class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">
<property name="mimetypeService">
<ref bean="mimetypeService"/>
</property>
<property name="checkCommand">
<bean class="org.alfresco.util.exec.RuntimeExec">
<property name="commandsAndArguments">
<map>
<entry key=".*">
<list>
<value>pdf2htmlEX</value>
<value>-v</value>
</list>
</entry>
</map>
</property>
<!--<property name="errorCodes">
<value>1</value>
</property>-->
</bean>
</property>
<property name="transformCommand">
<bean class="org.alfresco.util.exec.RuntimeExec">
<property name="commandsAndArguments">
<map>
<entry key=".*">
<list>
<value>pdf2htmlEX</value>
<value>--embed</value>
<value>CFIJO</value>
<value>${source}</value>
<value>${target}</value>
</list>
</entry>
</map>
</property>
<property name="processDirectory" value="/"/>
</bean>
</property>
<property name="explicitTransformations">
<list>
<bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
<constructor-arg>
<value>application/pdf</value>
</constructor-arg>
<constructor-arg>
<value>text/html</value>
</constructor-arg>
</bean>
</list>
</property>
</bean>
<bean id="transformer.pdf2htmlex" class="org.alfresco.repo.content.transform.ProxyContentTransformer"
init-method="register"
parent="baseContentTransformer">
<property name="worker" ref="transformer.worker.pdf2htmlex"/>
<!--The next two were added this because of the line at
https://github.com/Alfresco/community-edition/blob/afde3f58f91567b6f7eaa0bbac5e5adc38087fe0/projects/repository/
source/java/org/alfresco/repo/content/transform/AbstractContentTransformer2.java#L135 due to getting the
following error on startup:
Cannot create dynamic transformer transformer.complex.Xlsx.Html as sub transformers could not be found or
created ("*|pdf|pdf2htmlex"). Incidentally it had no effect as the transformer properties need to be in the form
*|pdf|*; but just in case this changes with future release of alfresco we leave this here and we are able to register custom transformers with the contentTransformerRegistry on startup.
-->
<property name="registry" ref="contentTransformerRegistry"/>
<property name="registerTransformer" value="true"/>
</bean>
Upvotes: 2