Akshay Sahni
Akshay Sahni

Reputation: 1

TIKA server returned status 500. I have latest version of TIKA installed from pip and JAVA 8. I have to extract text from my PDF

Here is the code I executed.

from tika import parser

file = 'sample.pdf'

file_data = parser.from_file(file)

text = file_data['content']

print(text)

I am getting error -

[WARNI] Tika server returned status: 500

None

Upvotes: 0

Views: 1799

Answers (2)

ANTRIKSH SINGH
ANTRIKSH SINGH

Reputation: 86

I encountered the same error and for me restarting the Tika sever worked.

Running this should yield the PID

ps aux | grep java | grep tika

and then kill the tika server and restart your python app.

Upvotes: 2

freeAR
freeAR

Reputation: 1145

This error 500 seems to be returned when Tika Server fails for reasons such as running out of memory in heap, and other exceptions I yet have to figure out. I see those exceptions in the Tika Server log. As a workaround, in my client using python-tika, I'll start to retry the query (parser.from_file()) a few times when it returns null content. This is just a workaround.

But in order to avoid the 500 error, we need to find out what makes the Tika Server fail (or crash?).

The warning message comes from tika.py.callServer()

Upvotes: 0

Related Questions