SRS
SRS

Reputation: 449

To search a particular file in PDF document using Java

Hi I have a PDF file and I need to search a particular string in that. I tried various methods, and I am able to read all the contents in PDF file but unable to find a particular string.

enter image description here

Here in this file, I need to search string such as Telephone, Garbage, Rent etc individually.

Could you please help me?

I have the below code for reading the file.

public class PDFBoxReader {

private PDFParser parser;
private PDFTextStripper pdfStripper;
private PDDocument pdDoc ;
private COSDocument cosDoc ;

private String Text ;
private String filePath;
private File file;

public PDFBoxReader() {

}
public String ToText() throws IOException
{
   this.pdfStripper = null;
   this.pdDoc = null;
   this.cosDoc = null;

   file = new File("D:\\report.pdf");
   parser = new PDFParser(new FileInputStream(file));

   parser.parse();
   cosDoc = parser.getDocument();
   pdfStripper = new PDFTextStripper();
   pdDoc = new PDDocument(cosDoc);
   pdDoc.getNumberOfPages();

   pdfStripper.setStartPage(1);
   pdfStripper.setEndPage(10);
   // reading text from page 1 to 10
   // if you want to get text from full pdf file use this code
   // pdfStripper.setEndPage(pdDoc.getNumberOfPages());

   Text = pdfStripper.getText(pdDoc);
   return Text;
  }

public void setFilePath(String filePath) {
    this.filePath = filePath;
}

}

It would be great if someone could help me with a code that searches for a particular string. Thanks in advance.

Upvotes: 0

Views: 4325

Answers (1)

River
River

Reputation: 9093

Try String.indexOf("substring") with String being what is returned from your ToText() method, and substring the string you wish to search for. (Side note, the custom in Java is camel-case methods, which would be toText() in this case.)

This method should find the first index of the entered substring in your long String of text. So you could do String.indexOf("Telephone") to find the first occurrence of the word Telephone in your String.

If you want the stuff directly after that substring, the index would simply be String.indexOf("substring")+"substring".length()

You can even find the next occurrence (or the next after that) with another variation of this method String.indexOf("substring", indexOfLastOccurrence+"substring".length)

Example:

String myPDF = ToText();
int rentIndex = myPDF.indexOf("Rent")+"Rent".length();
String rent = myPDF.substring(rentIndex); //Find 1st occurrence of "Rent" and get info after it
rent = rent.substring(int beginIndex, int endIndex); //Get endIndex-beginIndex characters after rent. (I assume you only want like a few numbers afterwards or something.)
//process rent e.g. Integer.parseInt(rent) or something

rentIndex = myPDF.indexOf("Rent",rentIndex)+"Rent".length();
rent = myPDF.substring(rentIndex); //Next occurrence of "Rent"
//Repeat to find the next occurrence, and the one after that. (Until rentIndex gets set to a negative, indicating that no more occurrences exist.)

Both methods can be found in the Java API: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(java.lang.String)

Upvotes: 1

Related Questions