Rawfodog
Rawfodog

Reputation: 11

Apache Tika parser char limit exception

I am using Apache Tike parser v1.24. We have large size PDF files. When parsing these we get the following error:

Exception: Your document contained more than 100000 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available).]

I tried to setting the parameter of bodyContentHandler to -1. But it didn't work.

Thanks in advance

Upvotes: 1

Views: 333

Answers (1)

marek.kapowicki
marek.kapowicki

Reputation: 732

Please use the pdfbox to split pdf file per page - look at class Splitter

Upvotes: 1

Related Questions