Mulone
Mulone

Reputation: 3663

wikitext to xml

Is there a way to convert wikitext data into simple XML in a Java application?

Input example:

  == A section ==
  this is some text...           

{{MyTemplate
|attr1=some value
|attr2=some other value
      ...

Output example:

<section title='A section'>this is some text...</section>
<ValueDescription attr1='some value' attr2='some other value' ...>

It seems like a trivial task but I couldn't find a library to do it in Java.

Mulone

Upvotes: 0

Views: 557

Answers (2)

sleeplessnerd
sleeplessnerd

Reputation: 22781

http://sweble.org/wiki/Wikitext-parser/ they have a properly done parser, but I think there is no XML output for the AST yet.

@Tgr: Syntactically it is not really compatible with a Tree but semantically it is.

And yes, handling Wikitext is a huge mess.

Upvotes: 0

Tgr
Tgr

Reputation: 28200

XML has a tree structure, wikitext for the most part does not. E. g. this is fully legal:

== A section {{DoubleEqual{{echo|Sign}}}}

The template syntax itself is hierarchical, and MediaWiki itself transforms it to XML (you can use Special:ExpandTemplates to check it out), but the rest of the syntax is much too loose for XML or other formal descriptions like a context-free grammar.

There is a rewrite effort going on to turn wikitext into a standard, parseable language, but don't expect it to end anytime soon.

Upvotes: 1

Related Questions