vikas Madaan
vikas Madaan

Reputation: 23

Multiple pdf file to txt in java

I am using pdfbox to convert pdf to txt, but i have multiple files in a folder which need to be created in different txt file each for one. My source code is

public class PDFconversion
{
          public static void main(final String[] args) throws IOException,SAXException, TikaException 
           {

              //Assume sample.txt is in your current directory

              File file = new File("sourcefile");

              //parse method parameters
              FileInputStream inputstream = new FileInputStream(file);
                BodyContentHandler handler = new BodyContentHandler();
                Metadata metadata = new Metadata();
                metadata.set("org.apache.tika.parser.pdf.sortbyposition", "true");
                ParseContext pcontext = new ParseContext();
                PDFParser pdfparser = new PDFParser();

                System.out.println("Parsing PDF to TEXT...");

                pdfparser.parse(inputstream, handler, metadata, pcontext);
              FileWriter fw=new FileWriter("targetfile");
      //parsing the file
                                    fw.write(handler.toString().trim());

                //System.out.println("Contents of the document:" + handler.toString());
        }
}

Upvotes: 0

Views: 508

Answers (1)

Tim Allison
Tim Allison

Reputation: 635

How about ‘java -jar tika-app.jar -t -i #input_dir# -o #output_dir#’? That invokes batch mode which will convert a full directory into a mirror directory with .txt files....or .json files with the ‘-J’ option

Upvotes: 1

Related Questions