Reputation: 11
I am using Apache Tike parser v1.24. We have large size PDF files. When parsing these we get the following error:
Exception: Your document contained more than 100000 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available).]
I tried to setting the parameter of bodyContentHandler
to -1
. But it didn't work.
Thanks in advance
Upvotes: 1
Views: 333
Reputation: 732
Please use the pdfbox to split pdf file per page - look at class Splitter
Upvotes: 1