Sina
Sina

Reputation: 270

Edit existing PDF's pages in Python

I have a PDF file which I removed some pages from it. I want to correct(fix) the new pdf page numbers. Is there any way/library to update the page numbers without converting the pdf to another format? I have tried to convert the pdf to text, XML, and JSON and then fix the page number. However, if I convert it back to pdf, it looks messy(cannot keep the style of the original pdf). The problems I have are:

  1. Removing the old page numbers.
  2. Adding new page numbers.

I am using python on Ubuntu. I have tried ReportLab, PyX, and pyfpdf.

Upvotes: 6

Views: 1128

Answers (1)

Preto
Preto

Reputation: 78

I have had a similar problem, I honestly could not fully solve it, rather, I fetched the corresponding html and processed it with BeautifulSoup. However, I did get a closer approach than python modules, I used pdftotext.exe from poppler (link at the bottom) to read the pdf file, and it worked just fine, besides the fact that it was not able to distinguish between text columns. As this is not a python module, I used os.system to call the command string on the .exe file.

def call_poppler(input_pdf, input_path):

    """
    Call poppler to generate a txt file
    """
    command_row = input_path + " " + input_pdf
    os.system(command_row)
    txt_name = input_pdf[0:-4] + ".txt"
    processed_paper = open_txt(txt_name)
    return processed_paper

def open_txt(input_txt_name):

    """
    Open and generate a python object out of the
    txt attained with poppler
    """
    opened_file = open(input_txt_name,"rb").readlines()
    output_file = []
    for row in opened_file:
        row = row.decode("utf-8").strip()
        output_file.append(row)
    return output_file

This returns you a processed ".txt" file that you can then process as you want and rewrite as a pdf with some module, such as pypdf, sorry if it was not the answer you wanted, but pdf files are rather hard to handle in python since they are not text based files. Do not forget to give the path of the executable. You can get poppler here: https://poppler.freedesktop.org/

Upvotes: 3

Related Questions