Reputation: 11427
I use Apache Tika to extract text of all kind of files. Now I also want to use it to detect the correct MIME-type of a file.
This works for example for...
application/pdf
)text/html
)...but not for:
text/plain
instead of text/css
)text/plain
instead of text/javascript
)(These Mime-type-results come from my application and also from the tika-app).
I need for my application an exact MIME-type like text/css
instead of the general text/plain
. Is this possible with Tika?
Upvotes: 1
Views: 2242
Reputation: 48326
You need to do two things. Firstly, you need to supply the filename to Tika, so it can use that to help specialise the plain text type into the appropriate subtype (CSS, JS etc). Secondly, you need to make sure you're using a new enough version of Tika.
I've just tried with the latest version of Tika, and with passing the filename in, and it can detect JS and CSS files just fine:
$ java -jar tika-app-1.3-SNAPSHOT.jar --detect testCSS.css
text/css
$ java -jar tika-app-1.3-SNAPSHOT.jar --detect testJS.js
application/javascript
Also, the latest version of Tika (as of r1400795) has a unit test that automatically verifies that JS and CSS detection work, so you can be doubly sure it works fine!
Upvotes: 2