Edward Tanguay
Edward Tanguay

Reputation: 193362

How to use Tika via PHP when both installed on one server?

What is the best way to call Tika from PHP in order to get the plain text of an uploaded file into PHP?

Searching around I find:

  1. PHP code that makes calls to a "Tika server" e.g. with cURL
  2. PHP Wrapper classes for Tika which seem to use Tika on the same server that PHP is installed on, but I have not gotten any of them to work.
  3. Alternatively, I could simply call Tika via the exec command.

But I'm not sure what is the easiest way to proceed.

Upvotes: 5

Views: 2447

Answers (2)

Kamafeather
Kamafeather

Reputation: 9845

Simpler approach (call API)

For running on a remote server I suggest you to use curl or Guzzle to call the address (but you could also simply use file_get_contents and pass it the URL for the API that will call Tika on the remote server.

Other approach (execute process on local server)

For running the parsing on local (Tika and PHP on same server) I used Synfony/Process.

I'd, personally, discourage you from just using exec.


I would add that having Tika on another server will force you to send this server the whole file payload uploaded from the user. While a faster solution would be to just receive the upload, with PHP execution, and directly call the Tika process from the same script (or at least from the same machine). Otherwise you need a script that:

  • receives the uploaded data
  • uploads that to the Tika server (maybe as payload of an API call)
  • tells to Tika (through API) on the remote server to parse the file
  • downloads the response parsed data
  • works with it or display it.

As I highlighed there will be a lot more overhead just as communication between the two servers; and that is not desirable when the file to parse is maybe a 35MB pdf-file, is it? The user would have to wait, let's say, 2 minutes for the upload, PLUS other, let's say, 20 seconds to send the file to the Tika server, and then other, let's say 3 seconds to get the text-format parsed result.

I strongly suggest to stay and work on the same PHP server.

Upvotes: 4

Itay Moav -Malimovka
Itay Moav -Malimovka

Reputation: 53607

If it is on your own managed servers, and both PHP and Tika locations are known to you, just use exec. Or if you prefer better control (which I suspect you do not need) use shell_exec
If you have some performance issues, and/or need to scale this thing, then there is room for a more elaborate solution.

Upvotes: 2

Related Questions