Reputation: 71

How to get pdf format of tensorflow documentation?

I want the pdf version of tensorflow documentation, as pdf is more convenient to read and annotate.

There's similar question which is out-of-date.

This is the source file of TF documentaion. I don't find pdf generate scripts yet. We may try to transform md/ipynb/html to a pdf, which is kind of troublesome.

Is there more straightforward and convenient way? Thanks!

Upvotes: 2

Answers (1)

xia

Reputation: 71

I write a script could generate pdf, with still some problem like TOC .

result pdf here: https://github.com/hxsnow10/tensorflow_doc_pdf

HTML generate

# change a diractoy of md and ipynb to html

import os
import pdfkit
import markdown


in_dir = 'site/en/tutorials'   # this will be relative_path"
out_dir = os.path.join("html", in_dir)
out_html_path = os.path.join(out_dir, "index.html")
out_images_path = os.path.join(out_dir, "inages")

cmd = "rm -rf {} ; mkdir -p {}".format(out_dir, out_images_path)
os.system(cmd)

cmd = "echo '' >  {}".format(out_html_path)
os.system(cmd)

files_txt = []
for root, subFolders, files in os.walk(in_dir):
    # copy image to prevent relative path miss
    if 'images' in subFolders:
        path = os.path.join(root, 'images')
        m_files1 = os.listdir(out_images_path)
        m_files2 = os.listdir(path)
        if set(m_files1) & set(m_files2):
            print 'BADLLY IMGES', m_files1, m_files2
        # better to recursive merge
        cmd = "cp -rf {}/* {}".format(path, out_images_path)
        os.system(cmd)

    nb_file_paths = [os.path.join(root,filename) for filename in files if filename.endswith('.ipynb')]
    md_file_paths = [os.path.join(root,filename) for filename in files if filename.endswith('.md')]

    print "md_files = " , md_file_paths
    for file_path in md_file_paths:
        out_path = file_path[:-3]+'.html'
        markdown.markdownFromFile(
            input=file_path,
            output=out_path,
            encoding='utf8'
        )
        cmd = "cat {} >> {}".format(out_path, out_html_path)
        os.system(cmd)

    print "nb_files = " , nb_file_paths
    if nb_file_paths:
        ss = ''
        for f in nb_file_paths:
            ss = ss + ' "' + f + '"'

        cmd = 'jupyter nbconvert ' + ss + ' --to html --stdout >> {}'.format(out_html_path)
        os.system(cmd)

which can generate a html with images:

PDF generate

pdfkit.from_file can not save images, but https://html2pdf.com/ do good.
TOC manually add e.g. https://github.com/yutayamamoto/pdfoutline TODO: TOC should be better solved by pypdf2 merge, auto getting page num.
TODO：human organization of TOC should prepare first to merge.

I may try more if I have time.

Links