tqn
tqn

Reputation: 125

Error in writing to .doc file by Apache POI

I'm writing a app to display and edit file .doc I'm using POI with HWPF. Now I can read text from file and write to file .doc too. But my reader only read default file .doc which is created by msoffice, It can't read the file created by my writer also msoffice can read this and all content was displayed right. It always show error:

Exception in thread "main" java.lang.RuntimeException:java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at org.apache.poi.hwpf.extractor.WordExtractor.getText(WordExtractor.java:322)
at ReadPOI.main(ReadPOI.java:18)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.poi.hwpf.usermodel.Range.binarySearchStart(Range.java:1016)
at org.apache.poi.hwpf.usermodel.Range.findRange(Range.java:1095)
at org.apache.poi.hwpf.usermodel.Range.initParagraphs(Range.java:982)
at org.apache.poi.hwpf.usermodel.Range.numParagraphs(Range.java:311)
at org.apache.poi.hwpf.converter.AbstractWordConverter.processParagraphes(AbstractWordConverter.java:1058)
at org.apache.poi.hwpf.converter.WordToTextConverter.processSection(WordToTextConverter.java:435)
at org.apache.poi.hwpf.converter.AbstractWordConverter.processSingleSection(AbstractWordConverter.java:1126)
at org.apache.poi.hwpf.converter.AbstractWordConverter.processDocument(AbstractWordConverter.java:722)
at org.apache.poi.hwpf.extractor.WordExtractor.getText(WordExtractor.java:304)
... 1 more

Are there any different between file created by msoffice and file created by my writer, and how to fix it. Please help me. There are my demo code in Java. Thank you

My reader:

import java.io.File;
import java.io.FileInputStream;

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Range;


public class ReadPOI 
{
public static void main(String args[]) throws Exception
{
    File file = new File("Test.doc");
    FileInputStream fin = new FileInputStream(file);
    HWPFDocument doc = new HWPFDocument(fin);
    Range range = doc.getRange();
    WordExtractor extractor = new WordExtractor(doc);
    System.out.println("starting\n" + extractor.getText() + "end\n");

    fin.close();
}
}

My Writer:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;

import org.apache.poi.hwpf.HWPFDocument;

public class WritePOI
{
public static void main(String args[]) throws Exception
{
    File file = new File("Template.doc");
    FileInputStream fin = new FileInputStream(file);
    HWPFDocument doc = new HWPFDocument(fin);
    doc.getRange().replaceText("Haha\n", false);
    FileOutputStream fout = new FileOutputStream("Test.doc");
    doc.write(fout);
    fout.close();
    fin.close();
}
}

Upvotes: 4

Views: 1659

Answers (1)

PbxMan
PbxMan

Reputation: 7623

It's a bug in the WordExtractor getText() that even remains up to the version 3.10-FINAL. It should not give you an:

Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:571)
    at java.util.ArrayList.get(ArrayList.java:349)
    at org.apache.poi.hwpf.usermodel.Range.binarySearchStart(Range.java:1016)

It is not marked as deprecated in the api but it says that getTextFromPieces() is faster. I double checked it using your example and it works OK.

So in the ReadPOI use:

    System.out.println(extractor.getTextFromPieces());

Or

    String [] dataArray = extractor.getParagraphText();
    for(int i=0;i<dataArray.length;i++)
    {
        System.out.println("\n–" + dataArray[i]);
    }

Upvotes: 1

Related Questions