AndrasK
AndrasK

Reputation: 31

How to convert html mixed markdown to html/docx/pdf?

i'm working in azure devops wiki for create specifications and other software documentations.

I have to create tables and in detail some bulleted list. It is possible in github flavored markdown (exactly in azure devops):

#header1

|TableHeader1|TableHeader2|
|--|--|
|Text1|Details 1|
|ListCell|<ul><li>FirstBullet</li><li>SecondBullet</li></ul>|

Html output screenshot

I try with pandoc for first, but the list fall out from the table. Any idea to convert into html/docx?

Regards, Andras

Upvotes: 3

Views: 1837

Answers (1)

Waylan
Waylan

Reputation: 42467

You probably can't. As the Pandoc documentation warns:

Because pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. Pandoc attempts to preserve the structural elements of a document, but not formatting details such as margin size. And some document elements, such as complex tables, may not fit into pandoc’s simple document model. While conversions from pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc’s Markdown can be expected to be lossy.

HTML is certainly more expressive than Markdown. Therefore, Pandoc does not guarantee that HTML source will be preserved. That said, a simple list is something that can be expressed in Markdown just fine, so one would think that would not be lossy.

However, the table complicates things. Pandoc actually supports four different table formats. However, only two of those formats (multi-line and grid tables) support cells which contain block level elements.

However, you appear to be using pipe_tables, which do not support block level elements within table cells. As the documentation states:

The cells of pipe tables cannot contain block elements like paragraphs and lists, and cannot span multiple lines.

While all of the above extensions (table formats) are supported by Pandoc's markdown format, only pipe_tables is supported by the gfm format (see Markdown Variants). Therefore, you might consider using the markdown format instead. However, that will only help if your table actually uses the proper syntax for grid or multiline tables.

Unfortunately, grid and multiline tables are only supported by Pandoc. I'm not aware of any other Markdown implementations which support them. Therefore, you cannot parse a table with block level elements in both Pandoc and other implementations.

So why does the other implementation you are using work fine with a raw HTML list within a table cell? Presumably the parser is not very smart and is blindly passing the raw HTML through unaltered. Any more sophisticated parsers which attempt to understand the raw HTML would not work for you. And , of course, if you want to convert the document to another (not HTML) format, then the parser needs to understand the raw HTML.

Maybe you could find some random parser which does what you want, but it is not likely. A better solution might be to take the HTML output of your other Markdown tool and use Pandoc (or another tool) to convert that to docx/pdf.

Upvotes: 1

Related Questions