Reputation: 1
Here is the code I executed.
from tika import parser
file = 'sample.pdf'
file_data = parser.from_file(file)
text = file_data['content']
print(text)
I am getting error -
[WARNI] Tika server returned status: 500
None
Upvotes: 0
Views: 1799
Reputation: 86
I encountered the same error and for me restarting the Tika sever worked.
Running this should yield the PID
ps aux | grep java | grep tika
and then kill the tika server and restart your python app.
Upvotes: 2
Reputation: 1145
This error 500 seems to be returned when Tika Server fails for reasons such as running out of memory in heap, and other exceptions I yet have to figure out. I see those exceptions in the Tika Server log. As a workaround, in my client using python-tika, I'll start to retry the query (parser.from_file()) a few times when it returns null content. This is just a workaround.
But in order to avoid the 500 error, we need to find out what makes the Tika Server fail (or crash?).
The warning message comes from tika.py.callServer()
Upvotes: 0