Sha Li
Sha Li

Reputation: 445

Use tika with python, runtimeerror: unable to start tika server

I am trying to use the tika package to Parse files. Tika is successfully installed, tika-server-1.18.jar runned with Code in cmd Java -jar tika-server-1.18.jar

My code in the Jupyter is:

import tika 
from tika import parser
parsed = parser.from_file('')

However, I receive below error:

2018-07-25 10:20:13,325 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2018-07-25 10:20:18,329 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2018-07-25 10:20:23,332 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2018-07-25 10:20:28,340 [MainThread ] [ERROR] Tika startup log message not received after 3 tries. 2018-07-25 10:20:28,340 [MainThread ] [ERROR] Failed to receive startup confirmation from startServer.

RuntimeError: Unable to start Tika Server.

Upvotes: 30

Views: 42548

Answers (7)

Teoh Sin Yee
Teoh Sin Yee

Reputation: 31

I got the same error and solved using steps below:

  1. Check my tika server log file (usually its located at C:/Users/your_user_name/AppData/Local/Temp/)

    2023-04-02 08:06:47,277 [Thread-1 (pr] [ERROR] Unable to run java; is it installed? 2023-04-02 08:06:47,278 [Thread-1 (pr] [ERROR] Failed to receive startup confirmation from startServer.

  2. It is suspected Java is not being installed. So check if Java is being installed using

    java -version

  3. If it's not installed, you may download it here: https://www.java.com/en/download/.

  4. If still error, try to start Tika server manually using:

    java -jar tika-server.jar

  • Remember run it at where your jar file is located. Now it should work.

Upvotes: 1

Karianjahi Njeri
Karianjahi Njeri

Reputation: 1

If your are using Ubuntu 20.01 (and 18.04) like me, the solution is to Install Oracle JDK 17. Do the following:

sudo add-apt-repository ppa:linuxuprising/java
sudo apt update
sudo apt install oracle-java17-installer

Type java -version on the terminal. You should see the following print-out:

java version "17.0.1" 2021-10-19 LTS`
Java(TM) SE Runtime Environment (build 17.0.1+12-LTS-39)`
Java HotSpot(TM) 64-Bit Server VM (build 17.0.1+12-LTS-39, mixed mode, sharing)

tika should then be able to extract text from your pdf in python.

parser.from_file(<your pdf file>)

Upvotes: 0

Aayush Shah
Aayush Shah

Reputation: 11

I faced similar issue. Tried all steps mentioned here, nothing helped. How I solved it:

  1. checked the log file of tika and tika-server. For windows, you can find it inside C:/Users/your_user_name/AppData/Local/Temp/
  2. Found that tika-server log had mentioned port already in use error.

check below log snippet -

INFO: Setting the server's publish address to be http://localhost:9998/
WARNING: FAILED SelectChannelConnector@localhost:9998: java.net.BindException: Address already in use: bind
java.net.BindException: Address already in use: bind
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Unknown Source)
        at sun.nio.ch.Net.bind(Unknown Source)
        at sun.nio.ch.ServerSocketChannelImpl.bind(Unknown Source)
        at sun.nio.ch.ServerSocketAdaptor.bind(Unknown Source)
        at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
        at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
        at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
        at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
        at org.eclipse.jetty.server.Server.doStart(Server.java:293)
        at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
        at org.apache.cxf.transport.http_jetty.JettyHTTPServerEngine.addServant(JettyHTTPServerEngine.java:417)
        at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.activate(JettyHTTPDestination.java:179)
        at org.apache.cxf.transport.AbstractObservable.setMessageObserver(AbstractObservable.java:49)
        at org.apache.cxf.binding.AbstractBindingFactory.addListener(AbstractBindingFactory.java:95)
        at org.apache.cxf.jaxrs.JAXRSBindingFactory.addListener(JAXRSBindingFactory.java:88)
        at org.apache.cxf.endpoint.ServerImpl.start(ServerImpl.java:123)
        at org.apache.cxf.jaxrs.JAXRSServerFactoryBean.create(JAXRSServerFactoryBean.java:206)
        at org.apache.tika.server.TikaServerCli.main(TikaServerCli.java:213)
  1. This clearly indicated that another process is already running in same port. So I just needed to kill java process running on port 9998 (which I assumed might have been defunct)
  2. Once I killed the process in task manager, I tried rerunning the python script, it worked correctly.
  3. To cross check you can also run the tika-server.jar file present in same path - C:/Users/your_user_name/AppData/Local/Temp/ using below command and check if it fails or runs correctly: java -jar tika-server.jar

Hope this will be helpful to someone in future.

Upvotes: 1

A. Pond
A. Pond

Reputation: 370

After you import Tika you need to initialize the Java Server

import tika
tika.initVM()
from tika import parser
parsed = parser.from_file('') //file name should be here

Upvotes: 12

Skoopski_Potato
Skoopski_Potato

Reputation: 143

Download Java. If you already have a version of Java installed, try updating it to the latest version. The version that works for me is 1.18.

Upvotes: 3

autry.richard
autry.richard

Reputation: 161

According to Apache Tika's site, all new versions of the tika-server.jar will require Java 8.

24 April 2018: Apache Tika Release Apache Tika 1.18 has been released! This release includes bug fixes (e.g. extraction from grouped shapes in PPT), security fixes and upgrades to dependencies. PLEASE NOTE: The next versions will require Java 8. Please see the CHANGES.txt file for the full list of changes in the release and have a look at the download page for more information on how to obtain Apache Tika 1.18.

Current outdated docs for tika Python library claim that Java 7 is needed, but now Java 8 must be installed. This is because the current version of tika-server.jar is automatically downloaded at runtime if not found in your temp file.

After installing Java 8, my basic test code launched the server and worked without error.

Upvotes: 16

user1613312
user1613312

Reputation: 374

You have not passed an argument (specified a file) in your line:

parsed = parser.from_file('')

Give it a file to chew on e.g.,

parsed = parser.from_file('myfile.txt')

The server didn't start & presumably this no log warning gets triggered - see line 644 in the source at the Github

then another error message tells you it ain't going to play...

Upvotes: 1

Related Questions