Reputation: 445
I am trying to use the tika package to Parse files. Tika is successfully installed, tika-server-1.18.jar
runned with Code in cmd Java -jar tika-server-1.18.jar
My code in the Jupyter is:
import tika
from tika import parser
parsed = parser.from_file('')
However, I receive below error:
2018-07-25 10:20:13,325 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2018-07-25 10:20:18,329 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2018-07-25 10:20:23,332 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2018-07-25 10:20:28,340 [MainThread ] [ERROR] Tika startup log message not received after 3 tries. 2018-07-25 10:20:28,340 [MainThread ] [ERROR] Failed to receive startup confirmation from startServer.
RuntimeError: Unable to start Tika Server.
Upvotes: 30
Views: 42548
Reputation: 31
I got the same error and solved using steps below:
Check my tika server log file (usually its located at C:/Users/your_user_name/AppData/Local/Temp/)
2023-04-02 08:06:47,277 [Thread-1 (pr] [ERROR] Unable to run java; is it installed? 2023-04-02 08:06:47,278 [Thread-1 (pr] [ERROR] Failed to receive startup confirmation from startServer.
It is suspected Java is not being installed. So check if Java is being installed using
java -version
If it's not installed, you may download it here: https://www.java.com/en/download/.
If still error, try to start Tika server manually using:
java -jar tika-server.jar
Upvotes: 1
Reputation: 1
If your are using Ubuntu 20.01 (and 18.04) like me, the solution is to Install Oracle JDK 17. Do the following:
sudo add-apt-repository ppa:linuxuprising/java
sudo apt update
sudo apt install oracle-java17-installer
Type java -version
on the terminal. You should see the following print-out:
java version "17.0.1" 2021-10-19 LTS`
Java(TM) SE Runtime Environment (build 17.0.1+12-LTS-39)`
Java HotSpot(TM) 64-Bit Server VM (build 17.0.1+12-LTS-39, mixed mode, sharing)
tika
should then be able to extract text from your pdf in python
.
parser.from_file(<your pdf file>)
Upvotes: 0
Reputation: 11
I faced similar issue. Tried all steps mentioned here, nothing helped. How I solved it:
C:/Users/your_user_name/AppData/Local/Temp/
check below log snippet -
INFO: Setting the server's publish address to be http://localhost:9998/
WARNING: FAILED SelectChannelConnector@localhost:9998: java.net.BindException: Address already in use: bind
java.net.BindException: Address already in use: bind
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Unknown Source)
at sun.nio.ch.Net.bind(Unknown Source)
at sun.nio.ch.ServerSocketChannelImpl.bind(Unknown Source)
at sun.nio.ch.ServerSocketAdaptor.bind(Unknown Source)
at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:293)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.apache.cxf.transport.http_jetty.JettyHTTPServerEngine.addServant(JettyHTTPServerEngine.java:417)
at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.activate(JettyHTTPDestination.java:179)
at org.apache.cxf.transport.AbstractObservable.setMessageObserver(AbstractObservable.java:49)
at org.apache.cxf.binding.AbstractBindingFactory.addListener(AbstractBindingFactory.java:95)
at org.apache.cxf.jaxrs.JAXRSBindingFactory.addListener(JAXRSBindingFactory.java:88)
at org.apache.cxf.endpoint.ServerImpl.start(ServerImpl.java:123)
at org.apache.cxf.jaxrs.JAXRSServerFactoryBean.create(JAXRSServerFactoryBean.java:206)
at org.apache.tika.server.TikaServerCli.main(TikaServerCli.java:213)
port 9998
(which I assumed might have been defunct)C:/Users/your_user_name/AppData/Local/Temp/
using below command and check if it fails or runs correctly: java -jar tika-server.jar
Hope this will be helpful to someone in future.
Upvotes: 1
Reputation: 370
After you import Tika you need to initialize the Java Server
import tika
tika.initVM()
from tika import parser
parsed = parser.from_file('') //file name should be here
Upvotes: 12
Reputation: 143
Download Java. If you already have a version of Java installed, try updating it to the latest version. The version that works for me is 1.18.
Upvotes: 3
Reputation: 161
According to Apache Tika's site, all new versions of the tika-server.jar will require Java 8.
24 April 2018: Apache Tika Release Apache Tika 1.18 has been released! This release includes bug fixes (e.g. extraction from grouped shapes in PPT), security fixes and upgrades to dependencies. PLEASE NOTE: The next versions will require Java 8. Please see the CHANGES.txt file for the full list of changes in the release and have a look at the download page for more information on how to obtain Apache Tika 1.18.
Current outdated docs for tika Python library claim that Java 7 is needed, but now Java 8 must be installed. This is because the current version of tika-server.jar is automatically downloaded at runtime if not found in your temp file.
After installing Java 8, my basic test code launched the server and worked without error.
Upvotes: 16
Reputation: 374
You have not passed an argument (specified a file) in your line:
parsed = parser.from_file('')
Give it a file to chew on e.g.,
parsed = parser.from_file('myfile.txt')
The server didn't start & presumably this no log warning gets triggered - see line 644 in the source at the Github
then another error message tells you it ain't going to play...
Upvotes: 1