Reputation: 125
I'm writing a app to display and edit file .doc I'm using POI with HWPF. Now I can read text from file and write to file .doc too. But my reader only read default file .doc which is created by msoffice, It can't read the file created by my writer also msoffice can read this and all content was displayed right. It always show error:
Exception in thread "main" java.lang.RuntimeException:java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at org.apache.poi.hwpf.extractor.WordExtractor.getText(WordExtractor.java:322)
at ReadPOI.main(ReadPOI.java:18)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.poi.hwpf.usermodel.Range.binarySearchStart(Range.java:1016)
at org.apache.poi.hwpf.usermodel.Range.findRange(Range.java:1095)
at org.apache.poi.hwpf.usermodel.Range.initParagraphs(Range.java:982)
at org.apache.poi.hwpf.usermodel.Range.numParagraphs(Range.java:311)
at org.apache.poi.hwpf.converter.AbstractWordConverter.processParagraphes(AbstractWordConverter.java:1058)
at org.apache.poi.hwpf.converter.WordToTextConverter.processSection(WordToTextConverter.java:435)
at org.apache.poi.hwpf.converter.AbstractWordConverter.processSingleSection(AbstractWordConverter.java:1126)
at org.apache.poi.hwpf.converter.AbstractWordConverter.processDocument(AbstractWordConverter.java:722)
at org.apache.poi.hwpf.extractor.WordExtractor.getText(WordExtractor.java:304)
... 1 more
Are there any different between file created by msoffice and file created by my writer, and how to fix it. Please help me. There are my demo code in Java. Thank you
My reader:
import java.io.File;
import java.io.FileInputStream;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Range;
public class ReadPOI
{
public static void main(String args[]) throws Exception
{
File file = new File("Test.doc");
FileInputStream fin = new FileInputStream(file);
HWPFDocument doc = new HWPFDocument(fin);
Range range = doc.getRange();
WordExtractor extractor = new WordExtractor(doc);
System.out.println("starting\n" + extractor.getText() + "end\n");
fin.close();
}
}
My Writer:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.hwpf.HWPFDocument;
public class WritePOI
{
public static void main(String args[]) throws Exception
{
File file = new File("Template.doc");
FileInputStream fin = new FileInputStream(file);
HWPFDocument doc = new HWPFDocument(fin);
doc.getRange().replaceText("Haha\n", false);
FileOutputStream fout = new FileOutputStream("Test.doc");
doc.write(fout);
fout.close();
fin.close();
}
}
Upvotes: 4
Views: 1659
Reputation: 7623
It's a bug in the WordExtractor getText() that even remains up to the version 3.10-FINAL. It should not give you an:
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:571)
at java.util.ArrayList.get(ArrayList.java:349)
at org.apache.poi.hwpf.usermodel.Range.binarySearchStart(Range.java:1016)
It is not marked as deprecated in the api but it says that getTextFromPieces() is faster. I double checked it using your example and it works OK.
So in the ReadPOI use:
System.out.println(extractor.getTextFromPieces());
Or
String [] dataArray = extractor.getParagraphText();
for(int i=0;i<dataArray.length;i++)
{
System.out.println("\n–" + dataArray[i]);
}
Upvotes: 1