Enzo
Enzo

Reputation: 316

Java PDFBox, extract data from a column of a table

I would like to find out how to extract from this pdf(ex. image) http://postimg.org/image/ypebht5dx/

For example, I want to extract only the values ​​in the column "TENSIONE[V]" and if it encounters a blank cell I enter the letter "X" in the output. How could I do?

The code I used is this:

 PDDocument p=PDDocument.load(new File("a.pdf"));
 PDFTextStripper t=new PDFTextStripper();
 System.out.println(t.getText(p));

and I get this output:

http://s23.postimg.org/wbhcrw03v/Immagine.png

Upvotes: 0

Views: 6768

Answers (1)

Smit
Smit

Reputation: 4715

These are just guidelines. Use them upon your use. This is not tested either, but help you solve your issue. If you have any question let me know.

String text = t.getText(p);
String lines[] = text.split("\\r?\\n"); // give you all the lines separated by new line

String cols[] = lines[0].split("\\s+") // gives array separated by whitespaces
// cols[0] contains pins
// clos[1] contains TENSIONE[V]
// cols[2] contains TOLLRENZA if not present then its empty

Upvotes: 1

Related Questions