CDC
CDC

Reputation: 21

Running ghostscript in Python using subprocess not working

Okay, here's a link to my input.pdf file if someone would like to try making subprocess.Popen work with ghostscript to convert this pdf file to a txt file, I've tried everything I can think of based on your answers so far.

Link to input.pdf file

Again, executing ghostscript from the command line directly works fine and this is what I'm getting via cmd line:

C:\Misc>gswin32c -dBATCH -dNOPAUSE -sPageList=1- -sDEVICE=txtwrite -sOutputFile=
    output.txt input.pdf
    GPL Ghostscript 9.21 (2017-03-16)
    Copyright (C) 2017 Artifex Software, Inc.  All rights reserved.
    This software comes with NO WARRANTY: see the file PUBLIC for details.
    Processing pages 1-.
    Page 1
    Page 2
    Page 3
    Page 4
    Page 5

However the following Python code is returning a non-zero exit status 1. In subprocess.check_output(), I've removed the file name variables and put in the file name path(s) directly to make things more straightforward.

import subprocess, os

#path for ghostscript executable
gs_path = 'C:\\Program Files\\gs\\gs9.21\\bin'

# dir where pdfs are located
dir_name = 'C:\\Misc'

# pdf to extract text from
file_name = os.path.join(dir_name,'input.pdf')

# output text file name
outfile_name = os.path.join(dir_name,'output.txt')

os.chdir(dir_name)


subprocess.check_output(['gswin32c',
'-dBATCH', '-dNOPAUSE', '-sPageList=1-', 'sDEVICE=txtwrite',
'-sOUTPUTFILE=C:\\Misc\\output.txt',
'C:\\Misc\\input.pdf'])

The complete output from Python is (Anaconda Python 3.5):

runfile('I:/Downloads/0.data/help_pdf.py', wdir='I:/Downloads/0.data')
Traceback (most recent call last):

  File "<ipython-input-26-0ba7dc571302>", line 1, in <module>
    runfile('I:/Downloads/0.data/help_pdf.py', wdir='I:/Downloads/0.data')

  File "C:\Custom_Programs\Anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 688, in runfile
    execfile(filename, namespace)

  File "C:\Custom_Programs\Anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "I:/Downloads/0.data/help_pdf.py", line 28, in <module>
    'C:\\Misc\\input.pdf'])

  File "C:\Custom_Programs\Anaconda\lib\subprocess.py", line 316, in check_output
    **kwargs).stdout

  File "C:\Custom_Programs\Anaconda\lib\subprocess.py", line 398, in run
    output=stdout, stderr=stderr)

CalledProcessError: Command '['gswin32c', '-dBATCH', '-dNOPAUSE', '-sPageList=1-', 'sDEVICE=txtwrite', '-sOUTPUTFILE=C:\\Misc\\output.txt', 'C:\\Misc\\input.pdf']' returned non-zero exit status 1

Seems like it's having a problem with the input.pdf file path.

Upvotes: 2

Views: 2996

Answers (1)

Charles Duffy
Charles Duffy

Reputation: 295262

You're misinterpreting the upstream documentation. When it tells you:

gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m -dGraphicsAlphaBits=4 \
    -sOutputFile=tiger.png tiger.eps

-sOutputFile=tiger.png and tiger.eps are two separate arguments, and you need to be passing them in different array positions.


The following code has been successfully tested with your given input.pdf on MacOS:

#!/usr/bin/env python3.5
import subprocess, sys

gs = 'gswin32c' if (sys.platform == 'win32') else 'gs'
file_name = 'input.pdf'
outfile_name = 'output.txt'

subprocess.check_output([gs,
    '-dBATCH', '-dNOPAUSE', '-sPageList=1-', '-sDEVICE=txtwrite',
    '-sOUTPUTFILE=%s' % (outfile_name,), file_name])

Note that -sDEVICE= starts with a -, which wasn't the case previously.

Upvotes: 1

Related Questions