Cédric Van Rompay
Cédric Van Rompay

Reputation: 2959

removing docinfo in docutils output with HTML5 writer

With the docutils python library, when using the html5 writer, I cannot find a way not to have the docinfo (fields at the beginning of the source) included in the output.

Here is a minimal example:

import docutils.io, docutils.core

SOURCE = '''\
:key: value

Title
========

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At
vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren,
no sea takimata sanctus est Lorem ipsum dolor sit amet.
'''

docutils_params = {
    'input_encoding': 'utf-8',
}

pub = docutils.core.Publisher(
    source_class=docutils.io.StringInput,
    destination_class=docutils.io.StringOutput)
pub.set_components('standalone', 'restructuredtext', 'html5')
pub.process_programmatic_settings(None, docutils_params, None)
pub.set_source(SOURCE)
pub.publish()

# same thing with parts['body']
body = pub.writer.parts['fragment']

print(body)

Here are the first lines of the output:

<dl class="docinfo simple">
<dt class="key">key</dt>
<dd class="key"><p>value</p>
</dd>
</dl>
<div class="section" id="title">
<h1>Title</h1>
<p>Lorem ipsum dolor sit amet,

What I don't want is the whole <dl class="docinfo simple"> element.

The use of the HTML5 writer is set in the line pub.set_components(... If instead I use html I don't have this problem, but for my use I need the HTML5 writer and not the standard HTML one.

What's weird is that the documentation seems to say that using pub.writer.parts['fragment'] (or equivalently pub.writer.parts['body']) should remove the docinfo from the output:

parts['fragment'] contains the document body (not the HTML ). In other words, it contains the entire document, less the document title, subtitle, docinfo, header, and footer.

source: http://docutils.sourceforge.net/docs/api/publisher.html

Am I doing something wrong or is it a bug in the HTML5 writer of docutils ?

Upvotes: 3

Views: 252

Answers (2)

C&#233;dric Van Rompay
C&#233;dric Van Rompay

Reputation: 2959

Docutils developers identified this as a bug and patched it.

See on the docutils-users mailing list here and here

@andref's answer is great, though, thanks for pointing to this package.

Upvotes: 1

andref
andref

Reputation: 750

I suggest you use rst2html5 instead of Docutils:

from rst2html5_ import HTML5Writer
from docutils.core import publish_parts

SOURCE = '''\
:key: value

Title
========

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At
vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren,
no sea takimata sanctus est Lorem ipsum dolor sit amet.
'''

parts = publish_parts(writer=HTML5Writer(), source=SOURCE)
print(parts['body'])

This is the result:

<section id="title">
    <h1>Title</h1>
    <p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.</p>
</section>

Upvotes: 1

Related Questions