Justin01
Justin01

Reputation: 288

Convert custom markdown to HTML?

Challenge : Our users have access to an "contentEditable" DIV in which a JS library inserts HTML in it. Here's how we thought the HTML should show up in the contentEditable :

<span class="stylish-blue-button">

   <span style="display:none;">[data-user="12345" data-userId="678910"]</span>

     John Smith

   <span style="display:none;">[/]</span>

</span>

...Blablabla some other text...

We hand over this HTML to PHP, where we execute strip_tags(). This should give us :

[data-user="12345" data-userId="678910"]John Smith[/] ...Blablabla some other text...

Question : When rendering the text on the page, we were wondering if there was a secure/reliable way to have the above custom markdown converted to (before handing it to Handlebars.js) :

<span class="stylish-blue-button" data-user="12345" data-userId="678910">John Smith</span> ...Blablabla some other text...

Why : This assures us that the user generated content was handled safely, all while keeping the user generated markdown in the contentEditable "pretty" ("stylish-blue-button" class).

If you have any suggestions to make this whole process simpler, we're opened to changing our markdown's format.

Thank you so much!

Upvotes: 2

Views: 315

Answers (1)

chris85
chris85

Reputation: 23892

You could use a regex like this:

$string = '<span class="stylish-blue-button">

   <span style="display:none;">[data-user="12345" data-userId="678910"]</span>

     John Smith

   <span style="display:none;">[/]</span>

</span>

...Blablabla some other text...';
echo preg_replace('~\[(data-user="\d+")\h+(data-userId="\d+")\]\s*(.+?)\s*\[/\]\s*(.*)~s', '<span $1 $2>$3</span>$4', trim(strip_tags($string)));

Here's a regex101 demo explaining exactly what that regex is doing. If you have a particular questions please ask.

Output:

<span data-user="12345" data-userId="678910">John Smith</span>...Blablabla some other text...

A few quick regex notes.

* is a quantifier meaning zero or more of the preceding character.
+ is a quantifier meaning one or more (aka it is required) of the preceding character.
\s is a whitespace character.
\h is a horizontal space.
. is any single character.
\d is a single number (0-9).
() are capturing groups they capture into $1, $2 etc. in the order they were found.

Looking at that regex again a quick note: This \[/\] is read as literal [/]. The backslashes are escaping the [] which otherwise would create a character class (meaning only the / character would be allowed there).

Multi-instances:

$string = '<span class="stylish-blue-button">

   <span style="display:none;">[data-user="12345" data-userId="678910"]</span>

     John Smith

   <span style="display:none;">[/]</span>

</span>

...Blablabla some other text...
<span class="stylish-blue-button">

   <span style="display:none;">[data-user="12345" data-userId="678910"]</span>

     John Smith

   <span style="display:none;">[/]</span>

</span>

...Blablabla some other text...
<span class="stylish-blue-button">

   <span style="display:none;">[data-user="12345" data-userId="678910"]</span>

     John Smith

   <span style="display:none;">[/]</span>

</span>

...Blablabla some other text...';
echo preg_replace('~\s*\[(data-user="\d+")\h+(data-userId="\d+")\]\s*(.+?)\s*\[/\]\s*~s', '<span $1 $2>$3</span>', trim(strip_tags($string)));

Output:

<span data-user="12345" data-userId="678910">John Smith</span>...Blablabla some other text...<span data-user="12345" data-userId="678910">John Smith</span>...Blablabla some other text...<span data-user="12345" data-userId="678910">John Smith</span>...Blablabla some other text...

For looser Ids just change the \d+ to [a-zA-Z0-9 ]+.

So:

preg_replace('~\s*\[(data-user="\d+")\h+(data-userId="[a-zA-Z0-9 ]+")\]\s*(.+?)\s*\[/\]\s*~s'

Upvotes: 2

Related Questions