Constantine
Constantine

Reputation: 228

Please suggest the Intermediate file format at conversion of PDF DOC RTF HTML

I'm going to write some converters.

I fought HTML is the best for that. For example:
- at first, i create HTML->PDF
- at second, i create DOC -> HTML (and get DOC->PDF also)
...so i will have 3 converters instead of 2.

What intermediate format can you suggest? (is the XML better for my task, but how to preserve formatting styles)

Thanks in advance.

Upvotes: 0

Views: 176

Answers (2)

RedGrittyBrick
RedGrittyBrick

Reputation: 4002

HTML as an intermediate language has it's limitations - you need to supplement it with CSS to capture presentational aspects. Separation of content and presentation is useful though.

Have you considered using a plain text format such as multimarkdown or textile?

Otherwise I would suspect that something like LaTeX or RTF would allow you to capture more of the presentation layout.

There already exist many applications that do what you describe. For example Pandoc

Upvotes: 1

shybovycha
shybovycha

Reputation: 12265

I think XML is the best intermediate format for any conversion. Also, you may use your own text or binary format.

Upvotes: 0

Related Questions