Path not printing string values

Question

I recently found this really handy library for pdf conversion. I am trying to convert a pdf to string values. In order to parse the data and convert to a csv file. I want to automate this for future so I cannot use Tabula.

I am calling some modules in order to convert pdf to string. The part for string conversion is not working. (pdf2string.py) Here is part for the pdf conversion to string.

I get no error. Success. But, there is no output.

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import HTMLConverter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from cStringIO import StringIO
import re
import csv
import sys

def convert_pdf_to_html(path):
    rsrcmgr = PDFResourceManager()
    retstr = StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = HTMLConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
    fp = file(path, 'rb')
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    password = ""
    maxpages = 0 #is for all
    caching = True
    pagenos=set()
    for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password, caching=caching, check_extractable=True):
        interpreter.process_page(page)
    fp.close()
    device.close()
    str = retstr.getvalue()
    retstr.close()
    return str

    print str

if __name__ == '__main__':
    if len(sys.argv) == 2:
        path = sys.argv[1]
        convert_pdf_to_html(path)

This is my bash.

python pdf2string.py example.pdf

Script is pdf2string.py and path is example.pdf.

I am also new to high-level logic in python.

pachewise · Accepted Answer

Edit: you are returning before printing - remove return str, or remove print str and use the advice below.

You're not printing the output of convert_pdf_to_html(), or saving it somewhere.

print convert_pdf_to_html(path)

Path not printing string values

Answers (1)

Related Questions