Gordon Mckeown
Gordon Mckeown

Reputation: 752

Converting HTML to Jira markup using only regex

I'm using a tool that doesn't have any specific HTML parsing capability. It does have regex replace functionality (based on the Boost library), and I'm able to use that to convert a lot of the formatting. I understand that it's imperfect, but it's "good enough".

Lists are proving a bit trickier. I know that I could use some script code to iterate through these, but given the power of regular expressions, I feel that it should be possible.

My input can contain something like this:

<p>Numbered bullet list:</p>
<ol>
    <li>Item 1</li>
    <li>Item 2</li>
    <li>Item 3</li>
</ol>
<p>Standard bullet list:</p>
<ul>
    <li>Item A</li>
    <li>Item B</li>
    <li>Item C</li>
    <li>Item D</li>
</ul>

And I'd like to convert this to:

Numbered bullet list:
    # Item 1
    # Item 2
    # Item 3
Standard bullet list:
    * Item A
    * Item B
    * Item C
    * Item D

I already have a first pass that will remove the paragraph tags, so these can be ignored for the purpose of this question. If I only have one list type, I can do a simple replace of the list tags. Is it possible to do the conversion for text containing both list types using only regexes?

Thanks!

Upvotes: 0

Views: 128

Answers (1)

l0wlik34G6
l0wlik34G6

Reputation: 11

I mean ... technically this is possible, but the problem here is that you might not know how many items you have.

You could do something like <ol>([\s\w<>/]+)<li>([^<]+)<\/li> and replace that with <ol>$1# $2 if you execute that n times (whatever arbitrary number this is), you would have basically the first list done.

The same goes with the <ul>: replace <ul>([\s\w<>/]+)<li>([^<]+)<\/li> with <ul>$1* $2 After that you have something like this:

<p>Numbered bullet list:</p>
<ol>
    # Item 1
    # Item 2
    # Item 3
</ol>
<p>Standard bullet list:</p>
<ul>
    * Item A
    * Item B
    * Item C
    * Item D
</ul>

Then you can remove the start and end tags of the lists.

PS: replacement syntax ($1) might vary depending on the tool you use

Upvotes: 1

Related Questions