Reputation: 11
I have a PID drawing (not a scanned copy) in pdf. The pdf has layers of text, object, instrument lines etc. I can see the layers. My objective is to copy the text part and process it as per requirement. However, i am not able to copy any of the text through normal code. Is there a way to do it? Currently, I am using a simple code of opening the pdf and then sending keys ctrl+a and ctrl+c.
Upvotes: 1
Views: 580
Reputation: 9032
Or, if you insist on doing it the current way, here's a java implementation:
Desktop.getDesktop().open(new File("C:\\Users\\Joris Schellekens\\Desktop\\pdfs\\30.pdf"));
Thread.sleep(5000);
Robot robot = new Robot();
robot.delay(1000);
// press CTRL+A
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(VK_A);
robot.keyRelease(VK_A);
robot.keyRelease(KeyEvent.VK_CONTROL);
// press CTRL+C
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(VK_C);
robot.keyRelease(VK_C);
robot.keyRelease(KeyEvent.VK_CONTROL);
// open empty file
Runtime.getRuntime().exec("notepad.exe");
Thread.sleep(5000);
// press CTRL+V
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(VK_V);
robot.keyRelease(VK_V);
robot.keyRelease(KeyEvent.VK_CONTROL);
For performance reasons, I'd time how long it takes to open the document, and how long it takes to open notepad. That way you're not wasting precious milliseconds waiting.
Upvotes: 7
Reputation: 9032
Consider using iText. It allows you to read a pdf document (from a file, a generic inputstream, byte[]), and has methods to enable text extraction. With some tweaking you can easily extract the locations of the text as well.
Upvotes: 3