Subrat Pathak
Subrat Pathak

Reputation: 11

Not able to copy text from pdf to excel using vba

I have a PID drawing (not a scanned copy) in pdf. The pdf has layers of text, object, instrument lines etc. I can see the layers. My objective is to copy the text part and process it as per requirement. However, i am not able to copy any of the text through normal code. Is there a way to do it? Currently, I am using a simple code of opening the pdf and then sending keys ctrl+a and ctrl+c.

Upvotes: 1

Views: 580

Answers (2)

Joris Schellekens
Joris Schellekens

Reputation: 9032

Or, if you insist on doing it the current way, here's a java implementation:

Desktop.getDesktop().open(new File("C:\\Users\\Joris Schellekens\\Desktop\\pdfs\\30.pdf"));
Thread.sleep(5000);

Robot robot = new Robot();
robot.delay(1000);

// press CTRL+A
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(VK_A);
robot.keyRelease(VK_A);
robot.keyRelease(KeyEvent.VK_CONTROL);

// press CTRL+C
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(VK_C);
robot.keyRelease(VK_C);
robot.keyRelease(KeyEvent.VK_CONTROL);

// open empty file
Runtime.getRuntime().exec("notepad.exe");
Thread.sleep(5000);

// press CTRL+V
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(VK_V);
robot.keyRelease(VK_V);
robot.keyRelease(KeyEvent.VK_CONTROL);

For performance reasons, I'd time how long it takes to open the document, and how long it takes to open notepad. That way you're not wasting precious milliseconds waiting.

Upvotes: 7

Joris Schellekens
Joris Schellekens

Reputation: 9032

Consider using iText. It allows you to read a pdf document (from a file, a generic inputstream, byte[]), and has methods to enable text extraction. With some tweaking you can easily extract the locations of the text as well.

Upvotes: 3

Related Questions