Jason
Jason

Reputation: 1130

Passing argument to pdf2txt function

I'm trying to use PDFMiner to extract texts from PDF file. I wanted to use script pdf2txt.py to run the sample example in

http://www.unixuser.org/~euske/python/pdfminer/index.html

with this single line

pdf2txt.py samples/simple1.pdf

Since I'm working on Windows with IDLE then I run the following scripts within IDLE

import pdf2txt

pdf2txt.main(['C:\Users\Desktop\Dictionary Construction\simple1.pdf'])

Each time it gave me

usage: C:\Usersernor\Desktop\Dictionary Construction\simple1.pdf [-d] [-p pagenos] [-m maxpages] [-P password] [-o output] [-C] [-n] [-A] [-V] [-M char_margin] [-L line_margin] [-W word_margin] [-F boxes_flow] [-Y layout_mode] [-O output_dir] [-R rotation] [-t text|html|xml|tag] [-c codec] [-s scale] file ...

I know it's an error message telling me that the argument was not parsed. The first couple of lines of pdf2txt.py is as follows:

def main(argv):
    import getopt
def usage():
    print ('usage: %s [-d] [-p pagenos] [-m maxpages] [-P password] [-o output]'
           ' [-C] [-n] [-A] [-V] [-M char_margin] [-L line_margin] [-W word_margin]'
           ' [-F boxes_flow] [-Y layout_mode] [-O output_dir] [-R rotation]'
           ' [-t text|html|xml|tag] [-c codec] [-s scale]'
           ' file ...' % argv[0])
    return 100
try:
    (opts, args) = getopt.getopt(argv[1:], 'dp:m:P:o:CnAVM:L:W:F:Y:O:R:t:c:s:')
except getopt.GetoptError:

How can I format my argument to make it? I know it's a dumb question, but it drives me nutd.

Please help me!

Thanks,

Jason

Updates

Following Luis's advice, I changed the command to

pdf2txt.main(['simple1.html','mypdf.pdf'])

Now it can produce the output in the shell window, however, I cannot find the output file 'simple1.html', I tried the following command:

pdf2txt.main(['-o C:\Users\Desktop\Dictionary Construction\simple1.html','mypdf.pdf'])

pdf2txt.main(['C:\Users\Desktop\Dictionary Construction\simple1.html','mypdf.pdf'])

None of them worked and produced files in the folder I designated.

Upvotes: 0

Views: 2426

Answers (1)

Luis Ávila
Luis Ávila

Reputation: 699

You should call it as:

pdf2txt.py samples/simple1.txt samples/simple1.pdf

If you want, let's say, samples/simple1.txt to be the output.

Upvotes: 1

Related Questions