Reputation: 3761
I am trying to find out the content type of a file using apache tika.
mean while i found this inconsistent behaviour.
final Tika tika = new Tika();
String fileType = tika.detect(uploadedInputStream);
System.out.println(fileType);
String newFileType = tika.detect(uploadedInputStream);
System.out.println(newFileType);
the above code is giving me out put as
application/pdf
application/octet-stream
I am expecting the output as application/pdf
in both cases.
Can anyone explain why it is happening like this? how can I get the intended result?
Upvotes: 3
Views: 1619
Reputation: 1
When you are trying to detect content type using tika it consumes your entire input stream and it will be empty after once. So the moment you are trying the second time by giving the same input stream you will be seeing application/octet-stream (an Unknown binary) because your stream is empty.
Tika tika = new Tika();
// preserve your stream as an instance and reuse it
TikaInputStream tikaInputStream = TikaInputStream.get(inputStream);
String fileType = tika.detect(tikaInputStream);
System.out.println(fileType);
final Tika newTika = new Tika();
String newFileType = newTika.detect(tikaInputStream);
System.out.println(newFileType);
You can also make duplicate copies of your source input stream to preserve it.
Upvotes: 0
Reputation: 3761
When I wrapped InputStream in TikaInputStream as suggested in the comments, I could see that the problem is solved
final Tika tika = new Tika();
TikaInputStream tikaInputStream = TikaInputStream.get(uploadedInputStream);
String fileType = tika.detect(tikaInputStream);
System.out.println(fileType);
final Tika newTika = new Tika();
String newFileType = newTika.detect(tikaInputStream);
System.out.println(newFileType);
OutPut:
application/pdf
application/pdf
Upvotes: 2