Reputation: 997
I have integrated Tesseract-ocr in Alfresco 5.0.d, My requirement is to convert PDF file data in to text format.
And Its working fine for small sized files.
But if i will upload larger size files, say more than 50 MB,
In that case its giving below Exception, and whole pdf file is not get converted in to text file. Only some starting pages are getting converted to text format.
Please refer the below logs
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
Does Anyone have faced the same issue, Please help me.
Thanks in advance.
Upvotes: 1
Views: 700
Reputation: 1346
You may have to increase the content transformation size of pdf to text in alfresco-global.properties file
you can give size for transformation using these properties
if you are using OOoDirect
content.transformer.complex.OpenOffice.Pdf2swf.extensions.doc.swf.maxSourceSizeKBytes=5120 content.transformer.complex.OpenOffice.Pdf2swf.extensions.docx.swf.maxSourceSizeKBytes=5120
if you are using OOoJodConverter
content.transformer.complex.JodConverter.Pdf2swf.extensions.doc.swf.maxSourceSizeKBytes=5120
content.transformer.complex.OpenOffice.Pdf2swf.extensions.docx.swf.maxSourceSizeKBytes=5120
refer this community question https://community.alfresco.com/thread/211670-changing-transformation-limits-version-5b
Upvotes: 2
Reputation: 439
I'm a bit surprised. Alfresco already includes PDFBox who is in charge of doing PDF --> TXT conversion. And so you don't need to use Tesseract.
Even your trace seems a bit weird. To see what's going on with the transformers, set log4j.logger.org.alfresco.repo.content.transform.TransformerDebug
and log4j.logger.org.alfresco.repo.content.transform
equals
to DEBUG
.
Upvotes: 2