Pat
Pat

Reputation: 101

Tika AutoDetectParser returning empty string?

I'm attempting to use Tika's AutoDetectParser to pull a file's content. I originally thought this was a dependency issue but cannot fathom how that could still be true now that i'm including all of tika-app in my jar.

AutoDetect Parser returns emptry string here :

BodyContentHandler handler = new BodyContentHandler();  
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
FileInputStream mypdfstream = new FileInputStream(new File("mypdf.pdf"));
parser.parse(mypdfstream,handler,metadata,context);
System.out.println(handler.toString());

Further confusing me is the fact that using a standard PDFParser works fine...:

BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
FileInputStream mypdfstream = new FileInputStream(new File("mypdf.pdf"));
PDFParser pdfparser = new PDFParser();
pdfparser.parse(mypdfstream,handler,metadata,context);
System.out.println(handler.toString());

I have included both the tika-app and tika-parsers jar on my classpath and included them within the jar created by ant.

relevant portions of build.xml

<javac srcdir="${src}" destdir="${build}">
                <classpath>
                        <pathelement path = "lib/tika-app-1.11.jar"/>
                        <pathelement path = "lib/tika-parsers-1.11.jar"/>
                </classpath>
 </javac>

<jar jarfile="${dist}/lib/MyProject-${DSTAMP}.jar" basedir="${build}">
        <zipgroupfileset dir="lib" includes="tika-app-1.11.jar"/>
        <zipgroupfileset dir="lib" includes="tika-parsers-1.11.jar"/>
</jar>

Edit: I looked at my list of supportedTypes with parser.getSupportTypes(context)) and it was empty. As is the list of parsers returned from parser.getParsers().

So perhaps this is yet another dependency issue? This truly surprises me given tika-app is included.

Upvotes: 9

Views: 4292

Answers (2)

Elya
Elya

Reputation: 1

For me downgrading libs to 1.18 worked. Shame that newer versions are not working

Upvotes: 0

Leonardo Bouchan
Leonardo Bouchan

Reputation: 356

I have the same issue, i have corrected adding the Tika Core and Parser dependency on my Pom.xml like this again and then Update Maven on Eclipse.

    <dependency>
      <groupId>org.apache.tika</groupId>
      <artifactId>tika-core</artifactId>
      <version>1.18</version>
    </dependency>
    <dependency>
      <groupId>org.apache.tika</groupId>
      <artifactId>tika-parsers</artifactId>
      <version>1.18</version>
    </dependency>

Upvotes: 3

Related Questions