bryan beverly
bryan beverly

Reputation:

Using Command Line Switches to Save a PDF as Text - Can it be done?

I need to use command line switches to execute the 'Save as Text' command. Ideally, I want to:

  1. use a command line switch to open a PDF
  2. use a command line switch to convert the PDF to a text file by mimicking the 'Save as Text' command.
  3. use a command line to close the PDF.

Is this possible? If so, then does anyone know how to do this?

Upvotes: 13

Views: 31214

Answers (5)

TransferOrbit
TransferOrbit

Reputation: 227

I will also suggest ps2ascii for posterity. It is freely available and pre-installed on many platforms and converts a PDF to TXT quickly and easily:

  • ps2ascii my.pdf (sends to stdout)
  • or send it to an output file ps2ascii my.pdf > my.txt

For what it’s worth, ps2ascii is a wrapper of ghostscript functionality.

Upvotes: 0

Acie
Acie

Reputation: 58

I think the below VBscript should do the trick. It will take all .pdf files in a given folder location and save them as .txt files. One major bummer is it only works if your machine is not locked since it uses the SendKeys command. If anyone has a solution that works while a computer is locked, please send it my way!

Set objFSO = CreateObject("Scripting.FileSystemObject")
objStartFolder = "PATH_OF_ALL_PDFS_YOU_WANT_TO_CONVERT_HERE"
Set objFolder = objFSO.GetFolder(objStartFolder)

Set colFiles = objFolder.Files
For Each objFile In colFiles
  extension = Mid(objFile.Name, Len(objFile.Name) - 3, 4)
  file = Mid(objFile.Name, 1, Len(objFile.Name) - 4)
  fullname = objFSO.BuildPath(objStartFolder, objFile.Name)
  fullname_txt = objFSO.BuildPath(objStartFolder, file + ".txt")

  Set objFSO = CreateObject("Scripting.FileSystemObject")

  If extension = ".pdf" And Not objFSO.FileExists(fullname_txt) Then
      WScript.Echo fullname
    Set WshShell = WScript.CreateObject("WScript.Shell")
    WshShell.Run """" + fullname + """"
    WScript.Sleep 1000
    WshShell.SendKeys "%"
    WScript.Sleep 100
    WshShell.SendKeys "f"
    WScript.Sleep 100
    WshShell.SendKeys "h"
    WScript.Sleep 100
    WshShell.SendKeys "x"
    WScript.Sleep 300
    WshShell.SendKeys "{ENTER}"

    count = 0
    'this little step prevents the loop from moving on to the next .pdf before the conversion to .txt is complete
    Do While i = 0 And count < 100
      On Error Resume Next
      Set fso = CreateObject("Scripting.FileSystemObject")
      Set MyFile = fso.OpenTextFile(fullname_txt, 8)
      If Err.Number = 0 Then
        i = 1
      End If
      count = count + 1
      WScript.Sleep 20000
    Loop
  End If
Next

Upvotes: 1

luochen1990
luochen1990

Reputation: 3847

Maybe you can try this: https://github.com/luochen1990/nodejs-easy-pdf-parser

It is a npm package and you need to install nodejs (and npm) to use it.

It can be used as a command line tool:

npm install -g easy-pdf-parser
pdf2text test.pdf > test.txt

And this tool will sort text lines by their y coordinates, so it works great at most case. And it also works well with unicode and cross platform.

Upvotes: 5

AutoDoc
AutoDoc

Reputation: 39

Don't use CMD; use AutoIt. Very easy to do and takes a few lines

Run("file.pdf")
winwait("Adobe")
send(?);; whatever commands necessary to save as text
send("{enter}")
send("!{F4}")

Upvotes: 3

Gareth Davidson
Gareth Davidson

Reputation: 4917

I don't understand why you'd not want to use free software (not freeware), pdftotext is the ideal solution. However, if you just want to actually open and save the PDF in an automated fashion using the Windows GUI, you could use vbscript and the sendkeys command.

Just use pdftotext though, it would be much more reliable and won't cost you a whole box.

Upvotes: 4

Related Questions