Sonson123
Sonson123

Reputation: 11427

Tika: MIME-Type detection of Js, Css

I use Apache Tika to extract text of all kind of files. Now I also want to use it to detect the correct MIME-type of a file.

This works for example for...

...but not for:

(These Mime-type-results come from my application and also from the tika-app).

I need for my application an exact MIME-type like text/css instead of the general text/plain. Is this possible with Tika?

Upvotes: 1

Views: 2242

Answers (1)

Gagravarr
Gagravarr

Reputation: 48326

You need to do two things. Firstly, you need to supply the filename to Tika, so it can use that to help specialise the plain text type into the appropriate subtype (CSS, JS etc). Secondly, you need to make sure you're using a new enough version of Tika.

I've just tried with the latest version of Tika, and with passing the filename in, and it can detect JS and CSS files just fine:

$ java -jar tika-app-1.3-SNAPSHOT.jar --detect testCSS.css 
text/css

$ java -jar tika-app-1.3-SNAPSHOT.jar --detect testJS.js
application/javascript

Also, the latest version of Tika (as of r1400795) has a unit test that automatically verifies that JS and CSS detection work, so you can be doubly sure it works fine!

Upvotes: 2

Related Questions