Hooked
Hooked

Reputation: 88118

Markdown syntax with pyparsing, getting spaces correct

I'm writing a small conversion program that takes a reduced Markdown syntax to html (as a learning exercise) but I'm having trouble getting the spacing correct:

from pyparsing import *

strong  = QuotedString("**")
text    = Word(printables)
tokens  = strong | text
grammar = OneOrMore(tokens)

strong.setParseAction(lambda x:"<strong>%s</strong>"%x[0])

A = "The **cat** in the **hat**."
print ' '.join(grammar.parseString(A))

What I get:

The <strong>cat</strong> in the <strong>hat</strong> .

What I would like:

The <strong>cat</strong> in the <strong>hat</strong>.

Yes this can be done without pyparsing and other utilities exist to do the exact same thing (e.g. pandoc) but I would like to know how to do this using pyparsing.

Upvotes: 3

Views: 509

Answers (1)

Birei
Birei

Reputation: 36262

Not very skilled with but I would try to use transformString() instead of parseString(), and leaveWhitespace() for the tokens matched, like:

from pyparsing import *

strong  = QuotedString("**").leaveWhitespace()
text    = Word(printables).leaveWhitespace()
tokens  = strong | text
grammar = OneOrMore(tokens)

strong.setParseAction(lambda x:"<strong>%s</strong>"%x[0])

A = "The **cat** in the **hat**."
print grammar.transformString(A)

It yields:

The <strong>cat</strong> in the <strong>hat</strong>.

UPDATE: Improved version pointed out by Paul McGuire (see comments):

from pyparsing import *

strong  = QuotedString("**")

strong.setParseAction(lambda x:"<strong>%s</strong>"%x[0])

A = "The **cat** in the **hat**."
print strong.transformString(A)

Upvotes: 3

Related Questions