Reputation:
I need to use command line switches to execute the 'Save as Text' command. Ideally, I want to:
Is this possible? If so, then does anyone know how to do this?
Upvotes: 13
Views: 31214
Reputation: 227
I will also suggest ps2ascii
for posterity. It is freely available and pre-installed on many platforms and converts a PDF to TXT quickly and easily:
ps2ascii my.pdf
(sends to stdout)ps2ascii my.pdf > my.txt
For what it’s worth, ps2ascii
is a wrapper of ghostscript functionality.
Upvotes: 0
Reputation: 58
I think the below VBscript should do the trick. It will take all .pdf files in a given folder location and save them as .txt files. One major bummer is it only works if your machine is not locked since it uses the SendKeys command. If anyone has a solution that works while a computer is locked, please send it my way!
Set objFSO = CreateObject("Scripting.FileSystemObject")
objStartFolder = "PATH_OF_ALL_PDFS_YOU_WANT_TO_CONVERT_HERE"
Set objFolder = objFSO.GetFolder(objStartFolder)
Set colFiles = objFolder.Files
For Each objFile In colFiles
extension = Mid(objFile.Name, Len(objFile.Name) - 3, 4)
file = Mid(objFile.Name, 1, Len(objFile.Name) - 4)
fullname = objFSO.BuildPath(objStartFolder, objFile.Name)
fullname_txt = objFSO.BuildPath(objStartFolder, file + ".txt")
Set objFSO = CreateObject("Scripting.FileSystemObject")
If extension = ".pdf" And Not objFSO.FileExists(fullname_txt) Then
WScript.Echo fullname
Set WshShell = WScript.CreateObject("WScript.Shell")
WshShell.Run """" + fullname + """"
WScript.Sleep 1000
WshShell.SendKeys "%"
WScript.Sleep 100
WshShell.SendKeys "f"
WScript.Sleep 100
WshShell.SendKeys "h"
WScript.Sleep 100
WshShell.SendKeys "x"
WScript.Sleep 300
WshShell.SendKeys "{ENTER}"
count = 0
'this little step prevents the loop from moving on to the next .pdf before the conversion to .txt is complete
Do While i = 0 And count < 100
On Error Resume Next
Set fso = CreateObject("Scripting.FileSystemObject")
Set MyFile = fso.OpenTextFile(fullname_txt, 8)
If Err.Number = 0 Then
i = 1
End If
count = count + 1
WScript.Sleep 20000
Loop
End If
Next
Upvotes: 1
Reputation: 3847
Maybe you can try this: https://github.com/luochen1990/nodejs-easy-pdf-parser
It is a npm package and you need to install nodejs (and npm) to use it.
It can be used as a command line tool:
npm install -g easy-pdf-parser
pdf2text test.pdf > test.txt
And this tool will sort text lines by their y coordinates, so it works great at most case. And it also works well with unicode and cross platform.
Upvotes: 5
Reputation: 39
Don't use CMD; use AutoIt. Very easy to do and takes a few lines
Run("file.pdf")
winwait("Adobe")
send(?);; whatever commands necessary to save as text
send("{enter}")
send("!{F4}")
Upvotes: 3
Reputation: 4917
I don't understand why you'd not want to use free software (not freeware), pdftotext is the ideal solution. However, if you just want to actually open and save the PDF in an automated fashion using the Windows GUI, you could use vbscript and the sendkeys command.
Just use pdftotext though, it would be much more reliable and won't cost you a whole box.
Upvotes: 4