Reputation: 335
I am trying to create a java application
which would search for particular word in the selected doc, docx
file and generates a report on it. That report will contain the page number and the line number of the searched word. Now that what I have achieved is I am able to read the doc
and docx
file by paragraph. But I didn't find any way to search for a particular word and to get the line & page number where that word is present. I searched a lot but no luck till now. Hope someone knows the way to do this.
Here is my code
if(fc.getSelectedFile().getAbsolutePath().contains("docx")) {
File file = fc.getSelectedFile();
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
System.out.println("Total no of paragraph "+paragraphs.size());
for (XWPFParagraph para : paragraphs) {
System.out.println(para.getText());
}
fis.close();
} else {
WordExtractor extractor = null;
FileInputStream fis = new FileInputStream(fc.getSelectedFile());
HWPFDocument document = new HWPFDocument(fis);
extractor = new WordExtractor(document);
String[] fileData = extractor.getParagraphText();
for (int i = 0; i < fileData.length; i++) {
if (fileData[i] != null)
System.out.println(fileData[i]);
}
extractor.close();
}
I am using swing
, apache
poi 3.10.1.
Upvotes: 4
Views: 4661
Reputation: 57381
I am afraid there is no easy way to do this. Line and page number aren't stored but calculated on fly based on text layout according to page size specified. The page widht defines wrapping positions in the text.
You can try to implement the feature yourself loading the document in a JEditorPane with appropriate EditorKit (see for example the attempt of DocxEditorKit implementation http://java-sl.com/docx_editor_kit.html It provides basic functionality and you can try to implement your own EditorKit here based on the source code and ideas).
The kit should support pagination to render page (See articles about pagination here http://java-sl.com/articles.html)
After the pagination done you can find position of the word (caret offset) and get the row/column (See http://java-sl.com/tip_row_column.html).
Upvotes: 5